Royal Army College

Lesson Overview

The last two lessons taught how to assess knowledge and skills; this one teaches how to mark what was assessed so that the result means the same thing whoever judged it and whenever it was judged. Marking is where reliability is won or lost. A perfect question or a sound practical task is wasted if two assessors mark it differently, or if the same assessor marks harder on Friday than on Monday, because then the result reflects the assessor as much as the candidate, and the candidate's fate turns partly on luck. This lesson is about removing that luck: marking against a scheme, holding a single fixed standard so that "a pass" means the same for everyone, and using moderation to keep different assessors and different occasions in line. It is the craft of consistency, and consistency, reliability, is one of the four principles the whole course protects.

The heart of it is a distinction the course has returned to and now names plainly: assessment is criterion-referenced, judged against a fixed standard, not norm-referenced, judged against how others did. A candidate passes because they reached the standard the job requires, not because they did better than the person next to them, and the standard does not bend with the strength of the group: a strong intake does not raise the bar, a weak one does not lower it. Holding that fixed standard, the same for every candidate, every assessor, and every occasion, is what makes a qualification trustworthy. A qualification whose standard drifts with the marker or the mood is worth nothing, because no one can know what it certifies.

This is the knowledge layer. It teaches you how to mark against a scheme, what it means to hold a fixed standard, and how moderation keeps assessors aligned, so that you can mark and help run assessments whose results are consistent and trusted. The judgement of applying a scheme to a real borderline answer, and the practised eye that marks the fiftieth script as fairly as the first, are built by marking real assessments under a qualified assessor and signed off in person. Read this to know how consistency is made; learn to mark consistently by marking.

By the end you will be able to mark against a prepared scheme, explain and hold a fixed criterion-referenced standard, distinguish criterion- from norm-referencing and say why the first is right for the Army, describe how moderation and standardisation keep assessors aligned, and recognise the threats to consistent marking and guard against them.

Key Terms

Marking: the judging of a candidate's answer or performance against a scheme to reach a result, done so that the same work earns the same mark whoever marks it.
Marking scheme: the prepared statement of what a correct or creditworthy answer or performance contains and how marks are awarded, written before marking begins.
Standard: the fixed level a candidate must reach to pass, set by what the job requires, held the same for every candidate, assessor, and occasion.
Criterion-referenced: judged against a fixed standard (the criteria), so a candidate passes by reaching the standard, regardless of how others did.
Norm-referenced: judged against how other candidates did (graded on a curve), which is wrong for a competence qualification because the standard floats with the group.
Moderation: the checking of marking across assessors and samples to confirm the standard is being applied consistently, and to correct it where it is not.
Standardisation: bringing assessors to a common understanding of the standard before and during marking, so they judge alike, often by marking the same samples together.
Marker drift: the tendency of a marker's standard to shift over a marking session, becoming harsher or more lenient as fatigue or comparison sets in.
Borderline: a candidate close to the pass line, where the marking scheme and standard must be applied with most care and consistency.
Trustworthy qualification: one whose result means the same thing for everyone who holds it, which consistent marking against a fixed standard is what produces.

Marking against a scheme

Marking is reliable only when it is done against a prepared marking scheme, written before any candidate is marked, that states what a creditworthy answer or performance contains and how marks are awarded. Without a scheme, each marker judges by a private sense of what is good, and private senses differ, so the same script earns different marks from different markers, and reliability collapses. With a scheme, every marker is applying the same explicit rule, so the same work earns the same mark, which is the whole point of marking.

For objective questions the scheme is simply the key: one correct answer per question, marked identically by anyone. This is why objective tests are so reliable, and why their marking is the easy case. For open written questions the scheme states what a full-credit answer must contain and how partial credit is given, point by point, so that a marker awards marks for the presence of the required things rather than for a general impression of quality. For practical assessments the scheme is the checklist and standard of Lesson 07, the points that must be present and the critical points that must be right. In every case the scheme exists so that marking is the application of a stated rule to the evidence, not the registering of an impression.

Two disciplines make scheme-marking work. First, write the scheme with the question, never afterward, because a scheme reverse-engineered from the answers that came in is shaped by those answers rather than by the standard, and a question whose scheme cannot be written in advance was not clear enough to ask. Second, mark to the scheme, not around it: the marker resists the pull to reward an answer that is clever but misses what the scheme requires, or to penalise one that meets the scheme in an unexpected way, because the moment marking departs from the scheme it departs from consistency. Where an answer genuinely exposes a gap in the scheme, the scheme is adjusted and applied to all scripts alike, not bent for one.

The fixed standard: criterion not norm

Behind the marking scheme stands the standard: the fixed level a candidate must reach to pass, set by what the job actually requires. The single most important fact about that standard is that it is fixed and criterion-referenced, not floating and norm-referenced, and the difference decides whether a qualification means anything.

Criterion-referencing judges each candidate against the fixed standard: they pass if they reach it, fail if they do not, regardless of how anyone else did. A whole strong intake can all pass, because all reached the standard; a whole weak intake can all fail, because none did. The bar does not move. Norm-referencing judges candidates against each other, the top so-many percent pass, or marks are spread on a curve, so that whether a given performance passes depends on the company it keeps. For a competence qualification, the kind the Army gives, norm-referencing is simply wrong, and dangerously so: it could pass a soldier who cannot safely do the task merely because they were the best of a weak group, or fail a competent one because their group was strong. The job does not care how a soldier compares to their intake; it cares whether they can do the task to the standard the task requires. So the Army assesses against the fixed criterion, always.

Holding the standard fixed has a hard consequence that assessors must accept: the result of an assessment is not adjusted to produce a desired pass rate. If many fail, the honest responses are to ask whether the teaching, the assessment, or the candidates fell short, and to remedy the real cause, not to lower the standard so the numbers look better. To pass candidates who did not reach the standard because too many would otherwise fail is to certify a competence that is not there, which betrays the candidates, those who will rely on them, and the qualification itself. The standard is the standard because the job requires it, and it is held whatever the pass rate, which is the same honesty the whole College keeps.

   CRITERION-REFERENCED  vs  NORM-REFERENCED

   CRITERION (correct)              NORM (wrong for competence)
   ---------------------------      ---------------------------
   judged against a FIXED standard  judged against OTHER candidates
   pass by reaching the standard    pass by beating enough others
   a strong intake can ALL pass;    the bar floats with the group's
     a weak one can ALL fail          strength
   the bar does NOT move             could pass the unsafe (best of a
                                       weak group) or fail the able

   The job asks "can they do it to standard?", not "are they better
   than their intake?". Hold the standard whatever the pass rate.

Moderation and standardisation

A fixed standard on paper is not the same as a fixed standard in practice, because the standard is applied by people, and people judging the same work can still judge it differently. Moderation and standardisation are the practices that close that gap, bringing the real marking of real assessors into line with the standard and with each other. They are how reliability is achieved across more than one assessor or one occasion.

Standardisation comes first, before and during marking: it brings the assessors to a common understanding of the standard, so they start out judging alike. The classic method is for the assessors to mark the same sample scripts or watch the same sample performances together, compare their judgements, and resolve the differences against the scheme and standard, until they are marking the same way. This is done at the start of a marking exercise and, on a long one, repeated partway through to catch drift. A team standardised this way applies one standard; a team that never compares applies as many standards as there are markers.

Moderation is the check, during and after marking, that the standard was in fact applied consistently. A moderator re-marks a sample of each assessor's work, blind to the original marks where possible, and compares: where the samples agree, the marking stands; where an assessor is out of line, consistently harsh, lenient, or erratic, it is investigated and corrected, which may mean re-marking that assessor's work. Moderation looks especially hard at the borderline and at the critical points, because that is where inconsistency does the most harm, the difference between a pass and a fail. Moderation is not distrust of the assessor; it is the system's way of guaranteeing to the candidate and to those who rely on the qualification that the result does not depend on which assessor they happened to get.

   KEEPING ASSESSORS ALIGNED

   STANDARDISATION (before/during)   bring assessors to a common
                                     understanding FIRST: mark the same
                                     samples together, resolve differences
                                     against the scheme, until they judge alike

   MODERATION (during/after)         CHECK the standard was applied: a
                                     moderator re-marks a sample of each
                                     assessor's work and compares; correct
                                     anyone out of line (harsh/lenient/erratic)

   Look hardest at the BORDERLINE and the CRITICAL POINTS.
   Aim: the result does not depend on WHICH assessor you got.

Threats to consistent marking

Even with a scheme and a fixed standard, a marker's judgement can drift, and the consistent marker guards against the known threats deliberately. Marker drift is the shift of a marker's standard over a session: as fatigue or comparison sets in, the bar creeps harsher or more lenient, so the fiftieth script is not judged as the fifth was. The guards are to return regularly to the scheme and to sample early scripts, to take breaks on a long session, and to re-standardise partway through. The halo and sequence effects of Lesson 07 apply to marking too: a strong earlier answer on a script can buoy a weak later one, and an impressive candidate can be marked up across the board. The guard is to mark to the scheme point by point, and, where practical, to mark question by question across all scripts rather than whole script by whole script, so each question is judged against the scheme afresh.

There is also the pull of leniency and severity: some markers are by temperament soft and some hard, and either betrays the standard, the soft marker passing the unready, the hard one failing the able. The guard is the same as for everything in this lesson, the scheme and the fixed standard, applied by a marker who knows their own tendency and corrects for it, and checked by moderation. The honest marker is not the kind one or the tough one but the consistent one, who gives the same work the same mark every time and holds every candidate to the one standard, no softer and no harder. That consistency, multiplied across assessors by standardisation and guaranteed by moderation, is what makes the result mean the same thing for everyone, which is the trustworthiness the qualification depends on.

In Practice: One Standard Across Three Assessors

A course at the Royal Army College ends in an assessment marked by three assessors across a large intake, part objective, part open written, part practical. The supervisor's concern is the candidate's nightmare: that a candidate's result should depend on which of the three assessors they drew. A weak supervisor would let each assessor mark their own pile their own way. The College's supervisor builds consistency in deliberately.

Before marking, she runs a standardisation session: the three assessors mark the same sample of scripts and watch the same sample of practical performances together, compare their judgements, and resolve every difference against the marking scheme and the fixed standard, until the three are marking alike. The objective section has a single key, so it marks identically anyway; the open questions have a scheme written with the questions, stating what earns each mark; the practical has the checklist and critical points of Lesson 07. She reminds the assessors that the standard is criterion-referenced, fixed by what the job requires, so a strong group does not raise it and a weak one does not lower it, and that the pass rate is whatever honest marking against the standard produces.

As marking runs, she has them mark question by question to keep each judgement against the scheme fresh, take breaks to fight drift, and re-standardise partway through. Then she moderates: she re-marks a sample of each assessor's work, blind to their marks, and looks hardest at the borderline and the critical points. Two of the three are aligned; the third is marking the open questions a little harshly, so she investigates, confirms it against the scheme, and has that assessor's open marks adjusted to the common standard. At the end, the result means the same thing across all three assessors and the whole intake: a candidate passed because they reached the standard, not because of which assessor they drew. The qualification is trustworthy, which is the whole aim of marking, standards, and moderation.

Check Your Understanding

Explain why marking must be done against a prepared scheme, and how the scheme differs for objective, open written, and practical assessment. Why must the scheme be written with the question rather than afterward, and what does "mark to the scheme, not around it" mean?
Distinguish criterion-referenced from norm-referenced assessment, and explain why criterion-referencing is the only right basis for an Army competence qualification. Why must the standard be held whatever the pass rate, and what are the honest responses when many candidates fail?
Explain the difference between standardisation and moderation and how each keeps assessors aligned with the standard and each other. Then name the threats to consistent marking (drift, halo and sequence, leniency and severity) and the guards against them.

Reflection (write a short paragraph): Think about a time when a result you received, or gave, seemed to depend on who was marking, a teacher known as a hard marker, an examiner softer than another. Using this lesson, explain what was missing, a scheme, a fixed standard, standardisation, moderation, and which principle (reliability) it failed. Then consider your own likely tendency as a marker: are you more inclined to be lenient or severe, and what would you do, using the tools in this lesson, to hold the one fixed standard against every candidate alike rather than letting your temperament set the bar?

Summary

Marking is where reliability is won or lost. It must be done against a marking scheme, written with the question, so that the same work earns the same mark whoever marks it: a key for objective questions, a points scheme for open questions, the checklist and critical points for practical assessment. Mark to the scheme, not around it.
The standard is fixed and criterion-referenced: each candidate is judged against the level the job requires, not against other candidates. A strong intake can all pass and a weak one all fail; the bar does not move. Norm-referencing (grading on a curve) is wrong for a competence qualification because it could pass the unsafe and fail the able.
The standard is held whatever the pass rate: results are not adjusted to produce a desired number of passes. If many fail, fix the real cause (teaching, assessment, or candidates), never lower the standard, which would certify a competence that is not there.
Standardisation brings assessors to a common understanding before and during marking (marking the same samples together until they judge alike); moderation checks during and after that the standard was applied (re-marking a sample of each assessor's work, correcting anyone out of line), looking hardest at the borderline and critical points. The aim: the result does not depend on which assessor you got.
Guard against marker drift (return to the scheme, sample early scripts, take breaks, re-standardise), the halo and sequence effects (mark to the scheme, question by question), and leniency or severity (know your tendency and correct for it). The honest marker is the consistent one.
This is the knowledge layer; applying a scheme to a real borderline answer and marking the fiftieth script as fairly as the first are mastered by marking under a qualified assessor and signed off in person. This lesson makes the knowledge tests of Lesson 06 and the practical assessments of Lesson 07 consistent, serves the reliability principle of Lesson 02, and feeds the fair decisions of Lesson 09 and the trustworthy records of Lesson 05.

Marking, Standards, and Moderation