Royal Army College

Lesson Overview

A real incident almost never begins with a klaxon. It begins with a signal that could be nothing: a login from a country no operator was in, a backup that ran late, a user who says their inbox is forwarding messages they never set up, a service that is slower than it should be. Most such signals are nothing. Some are the first visible edge of a serious problem. The work of this phase is to take a signal and turn it into an understood incident: to decide whether something is really wrong, how bad it is, what it touches, when it began, and what the facts are, all without trampling the very evidence that will tell you what to do next.

This is the second phase of the incident lifecycle. Lesson 01 prepared for the bad day, with roles, contacts, tools, clean offline backups, and rehearsals in place before anything happened. CIS 201 taught every member to recognise trouble and report it. This lesson is what the response team does the moment a report or an alert arrives: it picks up the signal, weighs it, scopes it, lays it out in time, and locks the evidence down. Get this phase right and containment and recovery become a clear job of work. Get it wrong, by misjudging the severity, missing half of what is affected, or wiping the proof while poking at the problem, and everything after it is built on sand.

This is the knowledge layer. It builds the judgement and the vocabulary you need, but the hands-on skills behind it, reading a captured log, taking a disk or memory image the right way, opening a packet capture in a tool like Wireshark, keeping a clean evidence record, are practised and signed off in person, under supervision, where the standing of your appointment allows you to touch the systems involved. By the end you will be able to recognise the indicators that turn a signal into a suspected incident, triage and prioritise an incident by its impact, scope what is affected, build an honest timeline of events, and preserve the main kinds of evidence (system logs, a disk or memory image, packet captures) without destroying them, recording every action you take with the time you took it.

Key Terms

Indicator: an observable sign that an incident may be happening, drawn from a person's report, a system log, or an alert. One indicator rarely proves an incident; several together build a case.
Detection: noticing that an event worth examining has occurred, whether a person reports it or a system raises an alert.
Analysis: the work of deciding whether a detected event is a real incident, and if so what it is, how bad it is, and what it affects.
Triage: sorting incidents by urgency and impact so the most serious gets attention first, exactly as a medic sorts casualties.
Severity: a rated measure of how serious an incident is, used to drive how fast and how hard the team responds.
Scope: the full extent of what an incident affects, which accounts, devices, services, and data, as opposed to only the part first noticed.
Timeline: an ordered account of what happened and when, built from evidence, used to understand the incident and to brief others.
Log: a dated record a system keeps of events such as logins, errors, and changes. Logs are the commonest and most useful evidence a small team has.
Alert: an automatic notification raised by a tool when it sees something it is configured to flag.
Evidence: the recorded facts of an incident, logs, images, captures, notes, kept intact so the team can understand what happened and, where required, account for it.
Disk image: an exact, bit-for-bit copy of a storage device, taken so the original can be set aside untouched and the copy examined.
Memory image: a captured copy of a running computer's volatile memory (RAM), which holds evidence that vanishes the moment the machine is switched off.
Packet capture: a saved recording of network traffic, examined with a tool such as Wireshark to see what actually crossed the wire.
Chain of custody: the unbroken, written record of who held a piece of evidence, when, and what they did with it, so its integrity can be trusted.

Detection: turning a signal into a suspected incident

Detection is simply noticing that something is worth a second look. For a small force it comes from two directions, and both matter.

The first is people. CIS 201 made every member a sensor: an operator who reports a strange login prompt, a national who says a state record looks altered, an officer who notices a service is down. The alert human is still the most valuable detection capability a small team has, because a person notices the odd, the out-of-place, the "that is not how this usually behaves" that no rule was written to catch. This is why CIS 201 insisted that a report is help, not failure, and why the response team must make reporting easy and answer it without blame. A team that punishes false alarms soon stops hearing about the real ones.

The second is systems. Even a small, self-hosted estate produces a steady stream of records and alerts: the single sign-on service logs every authentication, servers log errors and configuration changes, mail systems log forwarding rules and new sign-ins, anti-malware raises alerts, and certificates have issue and expiry records. You do not need an expensive monitoring suite to benefit from these. You need to know they exist, where they live, and how to read them when a question arises.

An indicator is any one of these signs. The discipline of detection is to treat an indicator as a question, not a verdict. A single failed login is noise; a thousand failed logins against one account in a minute, then one success, is a story. Indicators gain weight in combination, and they gain weight when they line up in time. Common indicators a small force will actually meet include:

a sign-in from an unexpected place, device, or hour, or from two distant places too close together in time to be one person travelling;
a flood of failed logins (a guessing attempt), sometimes followed by a success;
a mail account that has quietly grown a forwarding rule or a new recovery address nobody set;
a user reporting a phishing message, or admitting they clicked one and entered credentials;
files renamed, encrypted, or carrying a ransom note;
anti-malware alerts, or a device behaving oddly: hot, slow, fans running, unknown programs;
a service down or degraded with no planned cause;
a backup that failed or finished wrong;
a lost or stolen device reported missing.

When indicators cross a sensible threshold, you declare a suspected incident and move into analysis. Declaring early is not crying wolf. It starts the clock, the record, and the roles that Lesson 01 set up, and it is always cheaper to stand a suspected incident down than to start one late.

Triage: judging severity and prioritising by impact

Once something is a suspected incident, the first analytical act is triage: deciding how serious it is and therefore how fast and hard to respond. The word is borrowed from medicine on purpose. A medic faced with several casualties does not treat them in the order they arrived; they sort by who is most threatened and most savable, and spend the first effort there. Incident triage is the same discipline applied to systems and data.

Triage is driven by impact, what is at stake, judged against the three things security exists to protect, the CIA triad you met in CIS 201: confidentiality (is information being seen by the wrong people), integrity (is information being altered or destroyed), and availability (are systems and services unreachable when needed). An incident is serious in proportion to how much of these it threatens and how central the asset is. Two questions sharpen the judgement: how bad is the effect, and how widely does it reach. A useful way to hold both at once is a severity matrix.

   INCIDENT SEVERITY MATRIX  (impact across, reach down)

                  |  LOW impact        |  MODERATE impact     |  HIGH impact
                  |  (minor, reversible|  (real harm, limited |  (serious harm to
                  |   nuisance)        |   data/service)      |   data/identities/
                  |                    |                      |   continuity)
   ---------------+--------------------+----------------------+-------------------
   WIDE reach     |   MODERATE         |   HIGH               |   CRITICAL
   (many users /  |                    |                      |
    core service /|                    |                      |
    SSO / records)|                    |                      |
   ---------------+--------------------+----------------------+-------------------
   NARROW reach   |   LOW              |   MODERATE           |   HIGH
   (one user /    |                    |                      |
    one device /  |                    |                      |
    minor service)|                    |                      |
   ---------------+--------------------+----------------------+-------------------

   Push severity UP a step if any of these is true:
     * the single sign-on / identity service is involved
     * per-user certificates or keys may be exposed (for example a .p12)
     * nationals' personal data may have been seen or taken
     * a core or safety-of-life service is down
     * the attacker may still have live access right now

Read it plainly. A single user's device with a blocked, harmless malware alert is low: narrow reach, low impact, deal with it in good order. A phishing compromise of one ordinary account is moderate: act promptly but not in a panic. A sign that the identity service, the per-user certificates, or many nationals' records are touched is high or critical: it goes to the top of the queue, the incident lead is told at once, and the response runs at full pace. The matrix is a guide to thinking, not a master. The escalation triggers beneath it exist because some facts are serious whatever the grid says: anything touching identities, keys, nationals' data, a core service, or a live, present attacker is treated as grave until proven otherwise.

Triage decides the order of work and the speed of response, and it is revisited as you learn more. An incident that looked moderate can climb to critical the moment you discover it reached the SSO, and one that looked alarming can settle once you find it touched a single test account. Triage early, then keep grading as the facts come in.

Scoping: finding the full extent

The part of an incident you first notice is rarely the whole of it. A compromised account is the door, not the room. Scoping is the work of finding the true extent: every account, device, service, and set of data the incident reaches, not just the one that raised the flag. Scope it too narrowly and you will contain half the problem, declare victory, and watch it return from the part you missed.

Scope by asking outward from the first known fact, in widening circles:

From a compromised account: what did it have access to (least privilege limits this, which is one reason CIS 220 insists on it), what did it do while compromised, did it reach other accounts or services, were mail rules or recovery methods changed, were tokens or sessions issued that still work, were keys or certificates within its reach?
From a compromised device: what accounts were signed in on it, what keys or certificates did it hold, what did it connect to, could it have spread to other devices?
From a service incident: which other services depend on it, what data does it hold, who relies on it right now?

The honest answer to scoping is found in evidence, chiefly logs, and not in assumption. This is the moment the patient, unglamorous reading of logs pays for itself. The single sign-on service can tell you everywhere an account authenticated; the mail system can show every rule and new sign-in; the server logs can show what was reached and when. Follow the trail, write down what you confirm and what you only suspect, and keep widening the circle until the edges come back empty. Scoping is finished not when you are tired but when the questions stop turning up new affected things.

Building a timeline

As detection, triage, and scoping proceed, you assemble their findings into a timeline: a single, ordered account of what happened and when. The timeline is the most useful single artefact the analysis phase produces. It turns a scatter of separate observations, a login here, a file change there, an alert somewhere else, into one coherent story you can reason about, brief from, and hand on.

Build it from evidence, with times, and keep two clocks straight. There is the timeline of the incident (when the attacker acted) and the timeline of the response (when your team acted). Keep both, because confusing them is how a team accidentally blames itself for damage the attacker had already done, or vice versa. Use a consistent time zone, prefer the precise timestamps the logs give you over anyone's memory, and label anything uncertain plainly as uncertain.

   A SIMPLE INCIDENT TIMELINE  (one column, ordered, sourced)

   TIME (UTC)   WHAT HAPPENED                          SOURCE / EVIDENCE
   ----------   ------------------------------------   ---------------------
   09:02        ~300 failed logins vs one account      SSO auth log
   09:04        one successful login, unfamiliar IP    SSO auth log
   09:07        mail forwarding rule added to outside  mail audit log
                address
   09:31        national reports "odd email from" the  user report (phone)
                account
   --- detection / response begins here ---
   09:38        suspected incident declared, lead told  response log (us)
   09:41        full account log pulled and preserved   response log (us)
   09:55        scope: only this account so far; SSO    response log (us)
                shows no token reuse elsewhere

   Two clocks: above the line is the INCIDENT (what the attacker did);
   below it is the RESPONSE (what we did). Never let the two blur.

A timeline like this does four jobs at once. It shows the team what they are dealing with; it reveals gaps ("we have the bad login but not how the password leaked, find that"); it becomes the spine of the brief to the incident lead and, where duty requires, to others; and it feeds straight into the post-incident review in Lesson 10. Start it the moment you declare a suspected incident, and add to it as you go. A timeline reconstructed days later from memory is worth a fraction of one kept live.

Preserving evidence without destroying it

Here is the discipline that separates a trained responder from a well-meaning one who makes everything worse. The instinct, when you find a compromised machine, is to start poking: open files, run programs, reboot it, "have a look". Almost every one of those acts destroys evidence. Switching a machine off wipes its memory, the memory image that may hold the only trace of what was running. Logging in, opening files, or running tools changes timestamps and overwrites the very records you need. Rebooting can trigger clean-up the attacker built in. The rule is plain: preserve first, investigate second, and record everything you touch.

Preserving evidence rests on a few principles a small team can actually keep:

Work on copies, never originals. Take a disk image, a bit-for-bit copy, and examine that. Set the original aside, untouched, so it can be trusted later and re-examined if your copy is questioned.
Capture the volatile first. Memory (RAM), and any live network traffic, vanish when a machine is switched off or unplugged. If memory matters and your appointment and tools allow it, capture it before you power down. When in doubt about whether to pull the plug, ask: pulling it loses memory evidence, while leaving it on may let an attacker continue, and that trade is a decision for the incident lead, not a reflex.
Keep the logs before they roll over. Logs are finite; old entries are overwritten by new ones. Pull and save the relevant logs early, so the evidence still exists when you come to read it.
Capture network traffic where it helps. A packet capture records what actually crossed the wire and can be opened later in a tool such as Wireshark.
Record every action, with its time. Every step you take, what you did, when, on which system, and why, goes in the response log. This is your chain of custody: the written, unbroken account of who held each piece of evidence and what was done to it, which is what lets the evidence be trusted at all.
Stay within your appointment. Access follows appointment, not qualification. Imaging a server or capturing traffic is done by the person whose appointment covers it, on direction, not by whoever happens to be nearest. Acting outside your standing can taint evidence and is its own breach.

A short checklist keeps this straight in the moment, when adrenaline argues for poking around.

   EVIDENCE-PRESERVATION CHECKLIST  (work top to bottom; tick and time each)

   [ ] STOP.  Do not reboot, log in, open files, or "have a look" yet.
   [ ] CHECK appointment: am I the right person to touch this system?
   [ ] START the response log NOW: date, time zone, who is acting.
   [ ] VOLATILE FIRST: capture memory (RAM) and live traffic if needed,
       before any power-down  (incident lead decides power-down).
   [ ] LOGS: pull and save relevant logs before they roll over.
   [ ] IMAGE: take a disk image; set the ORIGINAL aside, work on the COPY.
   [ ] PACKET CAPTURE: save it (.pcap) if network detail matters.
   [ ] LABEL each item: what, from where, when taken, by whom, hash if able.
   [ ] CHAIN OF CUSTODY: record who holds each item and every handover.
   [ ] NEVER pay, tamper, alter, or hide.  Honesty is part of the evidence.

   Golden rule: if you are unsure whether an action destroys evidence,
   do not do it yet, write down the question, and ask the incident lead.

This protects three things at once. It protects the investigation, because you cannot understand an incident whose traces you have erased. It protects recovery, because evidence tells you the scope, which tells you what to clean or rebuild. And it protects accountability: many incidents, ransomware and data breaches especially, are crimes, and a breach of nationals' data carries a duty to be honest and to notify. Evidence kept properly lets the proper authorities act and lets the Principality account for itself truthfully. Evidence destroyed, even by good intentions, cannot be recovered.

Reading logs and packet captures, in plain terms

Two kinds of evidence reward a little familiarity, kept here at a practical, introductory level. You are not expected to leave this lesson an analyst; you are expected to know what these things are, what they can tell you, and how not to ruin them.

Logs are dated lists of events. The ones a small force will reach for most are the authentication logs from the single sign-on service (who signed in, from where, when, and whether it succeeded), mail logs (sign-ins, and changes to rules and recovery methods), and server and application logs (errors, configuration changes, access). Reading a log is mostly looking for the abnormal against the normal: a login at an odd hour, from an unfamiliar place, a sudden burst of failures, a change nobody remembers making. Two habits matter. First, save the slice you need before it rolls over and is gone. Second, mind the clock: line every log up in one time zone, because logs from different systems often disagree, and an hour's drift can scramble cause and effect in your timeline.

Packet captures are saved recordings of network traffic, the actual packets that crossed the wire, stored in a file (commonly .pcap) and opened in a tool such as Wireshark. A capture can show which machines talked to which, on what, and sometimes what was said, which helps answer "did anything leave the network" or "what is this device reaching out to". At this level, know three things. A capture must usually be made deliberately, before or during the event, by someone whose appointment covers it, so it is part of preparation as much as response. It can contain sensitive content, so it is itself evidence to be protected and handled within scope, never copied about casually. And reading one well is a supervised, hands-on skill, drawn from practical foundations such as a packet-analysis text, built up in person and not from a page. For now, recognise a packet capture, know it answers questions logs cannot, and know it must be preserved and handled with the same care as any other evidence.

In Practice: A Forwarding Rule Nobody Set

Late one morning a national telephones the duty operator to say a friend received an odd message from the national's own state mailbox, one they never sent. The operator, an RKA systems assistant, does not dismiss it and does not panic. CIS 201 taught them to treat the report as help. They open a response log on the spot: date, time in UTC, who is acting, and the bare report. They have just declared a suspected incident.

Detection gives way to analysis. Pulling the mail account's audit log and the single sign-on authentication log, the assistant finds the shape of it quickly. A burst of failed logins around 09:02, one success at 09:04 from an unfamiliar address, and at 09:07 a forwarding rule quietly added, sending copies of incoming mail to an outside address. They lay these out as a timeline with sources beside each line, keeping the incident clock (what the attacker did) separate from the response clock (what the team is now doing). For triage they reach for the severity matrix. One ordinary user account, narrow reach, but real harm to confidentiality, a forwarding rule is exfiltration, so this is moderate, acted on promptly. Then they apply the escalation triggers and pause: could the same leaked password reach other services, and were any per-user keys within this account's reach? Until those are answered, they treat it as the higher grade and tell the incident lead now rather than later.

Now scoping, and the discipline holds. The assistant resists the urge to "just fix it" by deleting the rule and changing the password, because doing so first would erase evidence and might miss the wider scope. Instead they preserve: save the relevant slices of the mail and SSO logs before they roll over, label them with what, when, and by whom, and note every action in the response log as a clean chain of custody. Only with the evidence secured do they widen the circle, checking whether the same account reached other services, whether new recovery methods or sessions were added, and whether any certificates were exposed. The single sign-on log shows the stolen session was used only against this mailbox and nowhere else, which lets the incident settle back to moderate, a judgement they record with its reasons.

   THE FORWARDING-RULE INCIDENT, AS ANALYSED

   DETECT   national reports a message they never sent
            -> declare suspected incident, START response log

   ANALYSE  logs show: failed-login burst -> 1 success (unknown IP)
            -> forwarding rule added to an outside address

   TRIAGE   one account, narrow reach, real confidentiality harm
            = MODERATE; but check SSO/keys triggers -> grade UP, tell lead
            -> SSO shows no spread -> settle back to MODERATE (recorded)

   SCOPE    preserve logs FIRST, then widen the circle:
            other services? new recovery methods? keys exposed? -> no

   HAND ON  clean timeline + preserved evidence + chain of custody
            -> ready for containment (Lesson 03), nothing destroyed

The assistant has not yet contained or fixed anything, and that is correct. That is Lesson 03's work, and it now has everything it needs: a clear severity, a confirmed scope, an honest timeline, and preserved, well-labelled evidence with an unbroken chain of custody. Because nothing was destroyed in haste, the team can reset the password, revoke the sessions and tokens, remove the rogue rule, confirm MFA, and notify, in good order and with the truth of what happened on record. A signal became an understood incident without a single piece of proof lost along the way.

Check Your Understanding

A single failed login is noise, but a thousand failed logins against one account followed by one success is a story. Explain why indicators gain weight in combination and in time, name three indicators a small force is likely to meet, and describe what it means to declare a "suspected incident" and why declaring early is cheaper than declaring late.
Using the severity matrix and its escalation triggers, triage these three and justify each: a blocked, harmless malware alert on one user's laptop; a phishing compromise of one ordinary mail account; and a sign that the single sign-on service or per-user certificates may be exposed. Why is triage revisited as the facts come in rather than fixed once at the start?
You find a compromised machine and your instinct is to log in, open a few files, and reboot it to "have a look". Explain what each of those acts destroys, state the rule that governs this phase, and walk through the evidence-preservation checklist in order, including why memory and live traffic must be captured before any power-down and what a chain of custody is for.

Reflection (write a short paragraph): This lesson argues that the analysis phase is mostly self-discipline: declaring early without crying wolf, grading honestly and re-grading as you learn, scoping until the edges come back empty, and, hardest of all, preserving evidence by not acting when every instinct says to fix the thing in front of you. Think of a time, in the Army or out of it, when you rushed to fix a problem and in doing so destroyed the information that would have told you what the problem really was. Working from the evidence-preservation checklist and the golden rule beneath it, what would slowing down by even a few minutes, to preserve before you investigated, have changed about the outcome, and what makes that pause so hard to hold in the moment?

Summary

Detection turns a signal into a suspected incident, and it comes from people (CIS 201's alert human, the most valuable sensor a small force has) and from systems (logs and alerts from the SSO, mail, servers, and anti-malware). Treat each indicator as a question, not a verdict; indicators gain weight in combination and in time. Declare early: it starts the clock, the record, and the roles, and standing one down is cheap.
Triage grades an incident by impact against the CIA triad and by reach, using the severity matrix. Push severity up whenever the identity service, per-user keys or certificates, nationals' data, a core service, or a live attacker is involved. Re-grade as the facts arrive.
Scope outward from the first known fact in widening circles, from account, device, or service, answered by evidence and not assumption. Stop only when the edges come back empty, because the part you first see is the door, not the room.
Build one ordered, sourced timeline, keeping the incident clock and the response clock separate, in a single time zone, started the moment you declare. It is the spine of the brief and feeds the Lesson 10 review.
Preserve evidence first and investigate second: work on copies (a disk image), capture the volatile first (memory and live traffic before any power-down), save logs before they roll over, take packet captures where they help, and record every action with its time as an unbroken chain of custody. Never pay, tamper, alter, or hide. Stay within your appointment.
Logs and packet captures (read in a tool such as Wireshark) are the two kinds of evidence worth knowing at an introductory level: know what each can tell you, mind the clocks, and learn the hands-on handling in person under supervision. This phase hands a clean, understood incident to Lesson 03 (Containment, Eradication, and Recovery), and builds on CIS 201, CIS 220's least privilege, SIG 220, and PME 210's records discipline.

Detection and Analysis