Assessment Will Eat Itself

Seemingly a lifetime ago I remember writing about the worst mark scheme ever written. Jon Tomsett recently wrote a searing blogpost about a more recent version.

Laura then took me to her classroom, where piles of coursework were strewn across every table, and showed me what she has to mark. She has 29 students’ work to assess, having to write comments to justify her marks in 7 boxes for each student. That is 203 separate comments with minimal, if any, support from OCR. Page after page of assessment descriptors without any exemplar materials to help Laura, and her colleagues across the country, make accurate interpretations of what on earth the descriptors mean.

This is an example — pure and simple — of assessmentitis.

“-itis” is the correct medical suffix since the assessment system is, indeed, inflamed. Distended. Bloated. Swollen. Engorged. Puffed up.

How did it come to this? When you meet people who work for the examination boards, they are — by and large — pleasant, normal, well-adjusted and well-intentioned people, at least as far as I can judge. How can they produce such prolix monstrosities?

Dr Samuel Johnson made the telling observation that “Uniformity of practice seldom continues long without good reason.” The fact that all the exam boards tend to produce similar styles of document indicates that they are responding to a system or set of pressures that dictate such a response.

I suspect that, at its heart, the system has at least one commendable aim: that of fairness, and that of ensuring that everyone is making similar judgements.

In answer to the age-old question: “But who is to guard the guards themselves?” they have attempted to set up an impenetrable Wall of Words.

But here’s the thing: words can be slippery little things, capable of being interpreted in many different ways. Hence the need to add a comment to give an indication of how one interpreted the marking criteria. It has been suggested that “expected practice” (“best practice” to some) is to include phrases from the marking criteria in the comment on how one applied the marking criteria . . .

This is already an ever-decreasing-death-spiral of self-referential self-referring: assessment is eating itself!

Soon we will be asked to make comments on the comments. And then comments on the comments that we made commenting on how we applied the marking criteria.

But here’s another thing: if the guards are so busy completing paperwork explaining how they are meeting the criteria of competent guarding and establishing an audit-trail of proof of guarding-competencies — then, at least some of the time, they’re not actually guarding, are they?

Who is to guard the guards themselves? In the end, one has to depend on the guards to guard themselves. Choose them well, trust them, and try to instil a professional pride in the act of guarding in them.

Pride and honest professionalism: they are the ultimate Watchmen.

O for a draught of vintage! (Or: Bring back POAE!)

O for a draught of vintage! that hath been

Cool’d a long age in the deep-delvèd earth

— John Keats, Ode to a Nightingale

The Northfarthing barley was so fine that the beer of 1420 was long remembered and became a byword. Indeed a generation later one might hear an old gaffer in an inn, after a good pint of well-earned ale, put down his mug with a sigh: “Ah! that was proper fourteen-twenty, that was!”

— J. R. R Tolkein, The Grey Havens, from The Lord of the Rings

I don’t know about anybody else, but I could do with a draught of the vintage good stuff right about now. I am that old gaffer in the pub muttering: “They should being back POAE, they really should.”

In all probability, only Science teachers of a certain generation (translation: old farts like me) will recognise the acronym P.O.A.E.

For the youthful pups who now seem to comprise the majority of the UK’s teaching workforce, it stands for “Planning, Evaluation, Observing and Evaluating”, the “strands” (dread word!) by which we used to mark practical skills in the good old days of yore, when the world was yet young.

And truth be told, they weren’t all that good. It is only in comparison with more modern iterations that they achieve their near-mythic ‘fourteen-twenty’ status.

One of the jobs I have been studiously avoiding over the summer holidays is to mark a portfolio of Y10 students’ controlled assessment practical work. I am dreading it. The reason is, I have to use the worst mark scheme every developed in the entire history of humankind. Or before. Or, applying a rigorous Bayesian statistical analysis of relevant probabilities, since.

Accuse me of hysterical hyperbole if you will, but take my word for it: this mark scheme is a turkey that out-turkeys all the Christmas lunches served over the past two millennia.

Let me explain. What is the purpose of marking students coursework or controlled assessment? Wearing our summative, assessment-of-learning hats for a moment, the essence of marking in this context is to generate a number that indicates a student’s relative performance. Ideally, another professional marking the same student’s work would generate a similar number.

Using the old-style POAE scheme, I would have to assess a student’s work against 25 hierarchical criteria which would give a “best fit” number out of a maximum of 30 marks. (Boy, this sure is a fun post, isn’t it?) From memory, moderators would tolerate a disagreement of plus or minus 3 marks before adjustment.

Using the modern, rubbish mark scheme, I have to assess a student’s work against, by my count, 67 hierarchical criteria which give a “best fit” number out of a maximum of 64 marks. This takes a while, as I challenge anyone to memorise or internalise the mark scheme.

And the end result: is a mark out of 64 ‘better’ than a mark out of 30? Does it allow a finer discrimination between the performance of students?

In theory: perhaps. In practice: no. It is just another example of assessment-itis:-itis” being the most appropriate suffix in this case as the entire system of assessment is, indeed, inflamed. More is, in fact, less.

As an example, under the old POAE-scheme, the P for Planning strand (dread word!) had 7 criteria and a maximum 8 marks. Using the new mark scheme, I mark the same set of skills which are now labelled as S for Strategy (“Mategy, Categy, Sategy”) and include two individual sub-strands (even more dread words!) with a total of 21 marking criteria and a maximum 16 marks. And . . . it doesn’t tell the student or the teacher anything that the older scheme did not.

It is, in my opinion, a badly-designed exercise in futility which provides no useful guidance or feedback for either student or teacher. Let it be sent forthwith to whatever corner of limbo that clapped-out assessment formats go to die. A curse upon it, and . . .

Sings to the tune of “My Bonnie Lies Over the Ocean”:

Bring back, bring back, O bring back my P-O-A-E, A-E!

Bring back, O bring back my P-O-A-E to me!