Often, people argue that human judgement is too fallible to be used in high-stakes assessment. They argue that assessment needs to be objective and that assessment programmes can only be fair if human judgement is contained by using structured and standardised assessment methods, by employing detailed checklists and scoring rubrics and by relying on numerical scoring only. So they contend that any form of subjectivity in the assessment process needs to be avoided.
The literature on heuristics and judgement biases seems to support these ideas, as there are numerous biases that may influence people’s judgements leading to flawed decisions. With this we don’t mean prejudices but thinking steps that unduly influence someone’s decision.
This may be true in itself, but there are other sides to the coin.
First, it is a - popular - misconception that subjectivity and unreliability are almost synonymous, where in fact they are not. Any assessment, be it a written or an oral form, is only a small sample of questions, assignments or tasks out of a large domain of possible tasks. Reliability is the extent to which a student score on a test is sufficiently representative of the score she or he would have obtained if they had answered all possible items, assignments or tasks, and this is regardless of whether the feature being assess is assumed to be objective or subjective. A single-item multiple-choice test would be a so-called objective test but it is certainly not a reliable sample. A large collection of subjective opinions stemming from a large pool of independent judges can lead to a reliable decision, despite its subjective nature.
Second, human judgement is not as fallible as we often tend to think. We have survived quite well as a species so not all of our decisions can be flawed. The literature on natural decision making supports this notion and argues that we make numerous decisions every day which are not perfect but certainly good enough.
Finally, one could seriously question whether assessment of students’ competence and/or performance is something that can be done objectively. Assessment includes valuing what we observe and therefore must be - at least partly - subjective.
Still, though, there is an issue of fairness; assessment needs to be fair. And for this, not only reliability is a factor, but also validity. We don’t want to expand too broadly on validity here, but it is basically the extent to which the assessment is actually assessing what it purports to assess. Sometimes this is obvious and clear (for example if we want to assess someone’s direct performance) but sometimes we want to assess something that cannot be observed directly (like competence, ability, personality traits, etc.) and then we have to make sure that we are making correct inferences about students. This is where judgement biases can have a negative effect, they can lead the assessor to make different judgements from those s/he would have made without biases and this impacts on the fairness of the assessment. Two examples of such biases are ‘primacy effect’ and ‘memory error of commission’. In the former the first impression has an unduly large effect on the overall decision about a candidate’s performance. In the second bias the examiner has false memories about the examination introduced into his/her memory, for example thinking that a student has answered certain questions correctly where the student actually did not.
The obvious idea would then be to train examiners to get rid of these biases to turn them into neutral and objective examiners, but this is not possible. Such judgement biases are probably necessary mechanisms for an examiner to process all the information about the performance of a candidate during the examination. Counter-intuitive as it may be it is more worthwhile to train examiners to develop their biases into so-called scripts. Scripts are more or less automated practices or sequences of decisions that we use to deal with problem situations and solutions. Those are typically situations that we have encountered numerous times before and that we have become expert at. In this light, biases can be seen as unripe scripts that evolve as we become more expert. Training, then, should focus on recognising judgement biases, acquiring strategies that will prevent them from overly influencing the overall judgement and understanding the nuances needed to turn these biases into expert scripts.
This is what the Better Judgement project is about; training assessors and examiners to become more expert users of (performance) scripts using their existing biases. This way the training will lead to the development of assessment literacy and expertise more quickly than normal experience would do.