Last week I was lucky enough to attend the ResearchEd national conference in London. I have spent the last week mulling over the sessions I attended and writing up my notes. Any errors are most certainly my own but this is what I took away from my day.
Having read Daisy’s excellent book on assessment I knew some of what to expect from this session but it was extremely useful to hear the author draw attention to certain aspects of her work and talk through her reasoning. The talk focused on four common problems with assessments as they are currently used in schools with a solution to each issue.
We need effective assessment at a system level so that we know if something is working (summative). We need it in the classroom to know if something has been learnt (formative).
Assessment is distorted in the education system by
- Prose based assessment
- Absolute judgement
- Grades as distinct categories
- Thinking that test scores matter
Prose based assessment has become THE way to assess. When schools came up with assessment models as part of Life After Levels they tended to fall back on prose descriptors. These are not useful at providing a summative grade as there is too much disagreement on whether a criteria has been met. they also fall short as a formative assessment as a small change in the question radically changes the difficulty. E.g. “Can compare two fractions to see which is larger” seems specific and precise but if you ask:
3/7 or 5/7? 90% of 14 year olds are correct.
3/4 or 4/5? 75% of 14 year olds are correct.
5/7 or 5/9? 15% of 14 year olds are correct.
It would be better to assess using well designed multiple choice where the distractors are 1) all plausible answers and 2) common errors. This would allow you to gauge who has learnt what and which aspects of the course need revisiting.
Absolute judgements are asked for when teachers are expected to grade a piece of work and say “This is Level X” – all evidence suggests that this is impossible and there will be not only variation between teachers but with the same teacher at different points in the day.
Comparative judgement is more accurate – we are very good as saying which of two pieces of work is “better”. To do this efficiently though needs software like that provided by No More Marking.
Grades as discrete categories mean that we lose any sense of what the grade is telling us. Two pupils may arrive in Year 7 both making “expected progress” but one pupil may be just one mark away from a pupil labelled as “below” and the other one mark away from a pupil labelled as “exceeding”.
A grade is not a “thing” they are just labelled on top of a scale of raw scores. We would be better off using the raw data to make judgements about pupil progress against expectations.
Thinking that test scores matter leaves us focusing on the number and not the inference that we make. Goodhart’s law states that “when a measure becomes a target it ceases to be a useful measure.” It starts to become too distorted. We need to carefully consider what we want an assessment to be for and design it to fit the purpose.
These are a few thoughts I left with.
- As a profession we are over reliant on prose based descriptors. These are a very poor way to make a judgement about learning.
- We need to consider how much moderation of work we are doing and ensure that something long term happens as a result. As a department we have been working on an agreed standard of “excellence” at different aspects of Geography and a checklist of what we require students to have learnt. How do comparisons within this work?
- How much thought do we give assessment design? Is this reflected in CPL?
That final point has stayed with me over the last week. In 14 years of teaching I am not sure that I have ever received any rigorous training on assessment design. Lots of what was called Assessment for Learning but little on the assessment that was being used for this.
When I started at my first school pupils would write a report linked to the topic they had been studying and this would then be given an NC level. GCSE pupils would sit past papers and we would use the old grade boundaries to give a grade. That is what we did because that is what we have always done.
I wonder how this fits with the experience of others?