Royal Army College

Lesson Overview

You have come the whole way round the cycle. You learned to see training as a system, to analyse a real need into objectives, to design those objectives into a course, to develop its materials, and to keep the standard it certifies, and then to write objectives precisely, to implement and run the course, to design within the means of a small force, and to govern and accredit the standard. One stage remains, and it is the one that closes the loop: evaluation. Evaluation is the College asking, honestly and at several levels, whether a course actually worked, and then using the answer to make the course better. Without it, a course can run for years, certificate after certificate, while quietly drifting away from the need it was built to meet, and no one would know until the day it failed in the field.

This lesson teaches you to evaluate a course in plain terms and to act on what you find. You will see the four levels at which a course is judged: the students' reaction, whether it was well run; their learning, whether they reached the standard; their behaviour, whether they do it in the field; and the result, whether it helped the Army. You will see how to gather honest feedback from students, instructors, and the field, and how to turn that feedback into real improvement, feeding it back to the start of the cycle so the course keeps getting better rather than staying merely good. And because this is the last lesson of TRG 410, you will see how analyse, design, develop, deliver, and evaluate draw together into one trustworthy whole, and why that whole is what makes a College qualification mean something.

By the end you will be able to evaluate a course at the four levels of reaction, learning, behaviour, and result; gather and weigh feedback from students, instructors, and the field; turn findings into prioritised improvements that feed back into analysis and design; and explain how the five stages of the systems approach close into a continuous loop that keeps the College's standards trustworthy over time.

This is the knowledge layer. Owning and running the evaluation of a real course is mastered by doing it under the eye of an experienced standards owner, and the evaluation report and improvement plan you draft are reviewed and signed off in person, where supervision allows, before any change is made to a live course. What follows teaches the method and the discipline of mind; the practice comes after.

Key Terms

Evaluation: the final stage of the systems approach to training, which judges whether a course actually worked and feeds what it learns back into the rest of the cycle to improve it. Evaluation looks outward at effect; validation, from Lesson 05, looks inward and outward at whether the standard was met.
The four levels: the four plain questions a course is judged by: reaction (was it well run?), learning (did they reach the standard?), behaviour (do they do it in the field?), and result (did it help the Army?). Each level is harder to measure than the one before and means more.
Reaction: the first level, the students' own response to how the course was run, gathered while it is fresh; useful for the conduct of a course but never proof that anyone learned anything.
Behaviour: the third level, whether what was learned is actually used in the field weeks and months later; the test of whether training transferred from the classroom to the job.
Result: the fourth level, whether the course made a difference to the Army that the need called for; the hardest to measure and the reason the course exists.
Feedback: the honest accounts of how a course worked, gathered from students, instructors, and the field, which evaluation weighs and turns into findings. Feedback is evidence, not yet a decision.
The field: the units and roles where qualified members actually do the job the course trained them for; the source of the behaviour-level and result-level evidence that no classroom can give.
Continuous improvement: the discipline of feeding evaluation findings back into analysis and design so a course is steadily made better over time, rather than left to drift; the loop that keeps the College good.
The loop: the closing of the systems approach back on itself, where evaluation hands its findings to a fresh analysis, so the five stages run not as a line with an end but as a cycle that never quite stops.

Why evaluation is the stage most often skipped

Evaluation is the stage a busy College is most tempted to drop, and the temptation is worth naming so that you can resist it. By the time a course has run, the certificates are issued and the course looks, from the outside, like a finished thing. Asking whether it actually worked feels like reopening a job already done, and it is the stage that can return uncomfortable answers, that the course a member built and is proud of does not, in fact, do what it was meant to. For both reasons, evaluation is quietly skipped more often than any other stage of the cycle.

That is exactly why a standards owner must guard it. A course that is never evaluated does not stop changing; it simply changes without anyone watching, as instructors adjust it, as the field shifts, and as the gap it was built to fill moves. Evaluation is how the College notices. It is the difference between a course that is steered and a course that drifts, and a drifting course issues the same certificate while certifying less and less. The work is to make evaluation a normal, expected part of every course, planned from the start and never bolted on at the end.

Notice the relationship to validation, which Lesson 05 taught. Validation asks whether the course met its standard, internally (did they learn it?) and externally (does it work in the job?). Evaluation is the wider act of judgement that puts those answers to use and adds the human questions around them, whether the course was well run and whether it served the Army. The two overlap and feed each other. Validation gives evaluation much of its hard evidence; evaluation decides what to do about it.

The four levels of evaluation

A course can be judged at four levels, and the great mistake is to stop at the first. Each level is harder to reach than the one before, and each means more. Walk up them in order.

The first level is reaction: was the course well run? This is the students' own response, how clear the teaching was, whether the programme flowed, whether the instructors were fair and prepared, whether the kit and the venue served. It is gathered while the course is fresh, usually at the end, and it is genuinely useful for improving the conduct of a course. But it carries a trap that catches the careless: a course that students enjoyed is not the same as a course that worked. Members can leave a well-run, well-liked course without having reached the standard, and a hard, unpopular course can teach superbly. Reaction tells you how the course felt, never whether it succeeded.

The second level is learning: did the students reach the standard? This is the result of the assessment, the evidence that members can actually do, under the conditions and to the standard, what the objectives required. This is the level that internal validation, from Lesson 05, supplies. It is firmer than reaction because it measures the thing the course was built to deliver, and a course where students react warmly but do not pass has a problem that reaction alone would have hidden.

The third level is behaviour: do they do it in the field? This is whether the learning transferred, whether members are actually using the skill in their units weeks and months later, correctly and by habit, not just on assessment day. It can only be seen in the field, never in the classroom, and it is where many courses are quietly found wanting, because a skill that passes on the course but is never used, or is used wrong, has not really stuck. This is the level external validation begins to reach.

The fourth level is result: did it help the Army? This is whether the course made the difference the need called for, whether the gap that justified it has actually closed. It is the hardest to measure, because many things shape an outcome and a single course is only one of them, but it is the reason the course exists. A course can be well run, teach to standard, and even change behaviour, and still fail at this level if the need it answered was the wrong need. That is why result-level findings feed straight back to analysis: they question not just the course but whether the gap was rightly read in the first place.

   THE FOUR LEVELS OF EVALUATION
   (climb in order; harder and more meaningful each step)

   level 4  RESULT     did it help the Army?
                       (the need closed?)              HARDEST
                          ^                            MEANS MOST
                          |
   level 3  BEHAVIOUR   do they do it in the field?
                          ^   (did it transfer?)
                          |
   level 2  LEARNING    did they reach the standard?
                          ^   (the assessment)
                          |
   level 1  REACTION    was it well run?               EASIEST
                       (how the course felt)           MEANS LEAST

   TRAP: a well-liked course is NOT proof of learning.
   Never stop at level 1.

You will not measure every course at every level with the same effort. Reaction and learning you can capture on the course itself, and they should be captured for every course, every time. Behaviour and result take longer, cost more, and need the field's help, so they are reached more selectively, on the courses that matter most and at sensible intervals. But a standards owner keeps all four in mind always, because the whole point of the higher levels is to stop the College fooling itself with the lower ones.

Gathering feedback honestly

Evaluation is only as good as the evidence it rests on, and evidence is gathered from three sources, each of which sees something the others cannot.

The students see the course from the inside. They know where it was clear and where it was muddled, where the pace was right and where it raced or dragged, whether they felt safe and fairly treated, and whether they left feeling able to do the task. Gather their feedback while the course is fresh, make it easy to give and safe to be honest in, and ask plainly: what helped you learn, what got in the way, what would you change. Anonymous responses get you the truths a member will not say to an instructor's face. But weigh student feedback for what it is, a strong read on reaction and a weak read on learning. Students are good judges of how a course felt and poor judges of whether it taught them, because a person does not always know what they have not learned.

The instructors see the course from the other side. They know which lessons consistently went long, which points the class always struggled with, where the materials let them down, where the programme had no margin, and which assessment items were unclear or unfair. An instructor who has run a course several times holds more pattern than any single cohort of students. Gather their feedback systematically after each running, not just as corridor grumbles, and treat it as the skilled professional judgement it is.

The field sees what neither the course nor its students can: whether the training actually works in the job. Unit commanders and the members themselves, months on, can tell you whether the skill is being used, whether it holds up under real conditions, where it falls short, and what the course never covered that the job demands. The field is the only honest source for the behaviour and result levels, and it is the hardest feedback to gather, because it lives outside the College and takes deliberate effort to reach. A standards owner builds a real channel to the field rather than waiting for complaints to arrive on their own.

   THREE SOURCES OF FEEDBACK
   (each sees what the others cannot)

   STUDENTS  ---> reaction, mostly      "how the course felt"
   (inside)       weak on learning      clear/muddled, pace,
                                        fairness, safety

   INSTRUCTORS -> conduct + pattern     "what always goes wrong"
   (other side)   across many cohorts   long lessons, hard
                                        points, weak items

   THE FIELD  --> behaviour + result    "does it work in the job"
   (outside)      the only honest        used? holds up? gaps?
                  source for levels 3-4

        all three  --->  weighed into FINDINGS
        (evidence, not yet a decision)

Across all three, hold one discipline: feedback is evidence, not a verdict. A single loud complaint is not a finding, and a warm round of student praise is not proof. The standards owner's job is to gather widely, weigh honestly, look for the patterns that several sources agree on, and separate a real signal from one person's bad week. Findings that change a live course are built from converging evidence, not from the last voice heard.

Turning findings into improvement

Evaluation that gathers evidence and stops there is just paperwork. The stage earns its place only when findings become improvements, and that takes its own discipline so that a course is made better rather than merely fiddled with.

Begin by turning the weighed evidence into clear findings, each a plain statement of something the course does well or does badly, with the evidence behind it. Then sort the findings, because not all matter equally. A finding that touches safety or the integrity of the standard is acted on first and always; a finding that a popular lesson could be a little crisper can wait its turn. Prioritise by how much each finding affects whether members reach the standard and use it in the field, not by how loudly it was raised or how easy it is to fix.

Then decide, for each finding worth acting on, where in the cycle the fix belongs, because this is the moment the loop closes. A muddled lesson or a thin materials pack is a development fix. A lesson in the wrong order, a method that does not suit its objective, or an assessment that does not truly test the task is a design fix. A standard that has slipped behind current doctrine or law is a standards fix, the work of Lesson 05. And a finding that the course trains the wrong thing, that the field does not need what it teaches or needs something it does not, is the deepest of all: it goes back to analysis, because the gap itself was misread. The greater the finding, the further back up the cycle it reaches, and a result-level finding can send a whole course back to the start.

Record every change in a controlled way, so the course's history is visible and a future owner can see what was changed, why, and on what evidence. Improve deliberately and in order rather than rewriting a course on every passing comment, because a course that lurches with each cohort's mood is as untrustworthy as one that never changes at all. The aim is steady, evidenced betterment, with the standard held firm through every revision.

Closing the loop: the whole cycle as one

This is the last lesson of TRG 410, and it is where the five stages become one thing. Analyse found the need and wrote the objectives. Design built the course from those objectives. Develop produced its materials and piloted it. Deliver taught it. Evaluate now judges it, and the moment it hands its findings back to a fresh analysis, the line you have been walking bends into a circle. The systems approach is not a road with an end; it is a loop that keeps turning, and evaluation is the hinge that turns it.

   THE CONTINUOUS-IMPROVEMENT LOOP
   (evaluation closes the line into a circle)

          +----------> ANALYSE <-----------------+
          |          (the real need)             |
          |               |                      |
          |               v                      |
          |            DESIGN                    | findings feed
          |          (the course)                | back to the
          |               |                      | right stage:
          |               v                      |
   result &              DEVELOP                 |  result -> analyse
   behaviour           (the materials)           |  behaviour-> design
   findings              |                       |  learning -> develop
          |               v                      |  reaction -> deliver
          |            DELIVER                    |
          |          (teach it)                  |
          |               |                      |
          |               v                      |
          +---------- EVALUATE -----------------+
                  (reaction, learning,
                   behaviour, result)

   The loop never stops. A qualification stays trustworthy
   only while the loop keeps turning.

Held this way, the five stages defend each other. Analysis keeps the course tied to a real need; design and development keep it sound; delivery makes it real; standards keep its bar fixed; and evaluation keeps the whole thing honest by checking that it still works and feeding back what it learns. Pull any one stage out and the rest weaken: a course analysed but never evaluated drifts, a course evaluated but never improved merely gathers complaints, a course improved without a held standard improves itself into something else. The loop is trustworthy only when every stage runs and the findings travel all the way round.

This is where the lesson meets LDR 420, on command responsibility and the integrity of the standard. The temptation in evaluation is always to soften the answer: to read warm reaction as proof of learning, to let an awkward field finding go unrecorded, to mark a course as working because reopening it is hard. To resist that is an act of integrity, and it is a command responsibility. A standards owner holds the loop honest on behalf of every member who will trust the qualification and every commander who will rely on what it certifies. The standard means the same thing over time only because someone, with the authority and the duty to do so, keeps turning the loop and refuses to flinch from what it shows. That refusal, course after course and year after year, is what makes the College trustworthy, and it is the work this whole speciality has been building you to do.

In Practice: Sergeant Adeya closes the loop on the radio course

A year after Sergeant Adeya designed and ran the section radio course, she is its standards owner, and the time has come to evaluate it properly rather than rest on the fact that it runs. She climbs the four levels in order.

Reaction she already has: the end-of-course feedback was warm, students found the instructors fair and the practice enjoyable. She notes it and does not stop there, because she knows a liked course is not a proven one. Learning she draws from the assessment records: pass rates are healthy, members reach the standard on the day. So far the course looks fine. But the two levels that matter most are still untouched, so she reaches into the field.

For behaviour, she asks three section commanders whether their members actually use correct voice procedure on the section radio now, weeks and months on. The answers are uncomfortable. Members can pass a message on assessment day, but in the field many fall back into sloppy, made-up procedure under pressure, and faultfinding, the course's last lesson, is barely used at all. For result, she asks the harder question: are sections passing clear messages reliably, which was the whole need? Better than before, but not reliably enough. The gap the course was built to close has narrowed, not closed.

She gathers the three sources together. Students liked it; instructors report that the faultfinding lesson always felt rushed at the end of a full programme; the field reports that procedure does not survive contact and faultfinding never stuck. The pattern is clear and the sources agree. She writes her findings and sorts them. The weak retention of procedure under pressure is the priority, because it strikes at whether the training works in the job.

Then she sends each finding to the right stage of the loop. The rushed faultfinding lesson is partly a programme problem, a design fix: it needs its own protected time, not the scraps at the end. The failure of procedure to survive pressure is deeper; it suggests the course taught procedure in calm conditions only, a design and method fix, so realistic, pressured practice must be built in. And the field's report that members need a quick reference card in the vehicle is a development fix, a new piece of student material. The biggest finding, that the need has not fully closed, she carries back to analysis, to ask whether the course alone was ever going to close it or whether the units need refresher practice the College should also provide. She records every change with its evidence, submits the evaluation report and improvement plan for review and sign-off, and the loop turns once more. The course that comes out the other side is not new, but it is honestly better, and the standard it certifies still means what it says.

Check Your Understanding

Name the four levels of evaluation in order, say what each one asks in plain terms, and explain why stopping at the first level, reaction, is the great mistake.
The three sources of feedback are students, instructors, and the field. What does each see that the others cannot, and why is the field the only honest source for the behaviour and result levels?
Explain how an evaluation finding is matched to the right stage of the cycle for its fix, and why a result-level finding can travel all the way back to analysis. What does it mean to say evaluation "closes the loop"?

Reflection (write a short paragraph): Think of a course or training you completed that you enjoyed at the time. Looking back honestly, did it change what you actually do in practice weeks and months later, or did it stop at reaction? Describe one thing the course's owner could have learned by reaching past your reaction to your behaviour, and what they might have improved.

Summary

Evaluation is the final stage of the systems approach, judging whether a course actually worked and feeding what it learns back into the rest of the cycle. It is the stage most often skipped, and a standards owner must guard it, because an unevaluated course does not stop changing, it simply drifts unwatched.
Judge a course at four levels, climbing in order: reaction (was it well run?), learning (did they reach the standard?), behaviour (do they do it in the field?), and result (did it help the Army?). Each is harder to measure and means more, and a well-liked course is never proof that anyone learned.
Gather feedback from three sources that each see what the others cannot: students (reaction, from inside), instructors (conduct and pattern across cohorts), and the field (behaviour and result, the only honest source for the higher levels). Feedback is evidence to be weighed, not a verdict.
Turn weighed evidence into clear findings, prioritise by effect on the standard and the field rather than by loudness, and send each finding to the right stage for its fix: develop, design, standards, or all the way back to analysis. Record every change in a controlled way.
Evaluation closes the line into a loop: analyse, design, develop, deliver, evaluate, then back to analyse, the five stages defending each other so the whole stays honest. A qualification means the same thing over time only while the loop keeps turning.
This lesson completes TRG 410 and the Training and Instruction speciality. It rests on Lesson 05 · Maintaining Training Standards (validation and the held standard) and draws the whole course, from Lesson 01 · Training as a System, into one cycle. It meets LDR 420 · Command Responsibility and Ethical Leadership (the integrity of the standard), feeds ADM 220 · Course Records and Qualification Tracking (the controlled record of changes), and connects to PME 210 · Basic Staff Duties and Written Orders (writing the evaluation report).

Evaluating and Improving