Royal Army College

Lesson Overview

There will be a bad day. Not perhaps, not if we are unlucky, but at some point: an account taken over, a device lost, a service down, a record exposed, files held to ransom. No force, however careful, escapes incidents forever, and a digital state has more surface for them than most. The honest question is not whether the bad day comes but how badly it goes when it does, and the surprising answer is that most of that outcome is settled long before the day itself. It is settled in the quiet, unglamorous work of preparation: writing a plan while the systems are calm, naming who does what, keeping the contact lists and tools and clean offline backups ready, and rehearsing until the team can act without thinking. This lesson is about that work, because preparation is the phase that decides how every other phase goes.

It also sets the spine of the whole course. You will meet the recognised incident lifecycle from the NIST standard SP 800-61: Preparation, then Detection and Analysis, then Containment, Eradication, and Recovery, then Post-Incident Activity. Every later lesson lives inside one of those phases, and they map onto the security functions you already learned in CIS 201. Getting the shape of the lifecycle into your head now, and seeing why its first phase carries the most weight, is what the rest of the course will build on. We are not yet detecting or containing anything; we are getting ready to, which is the work that makes detecting and containing possible.

This is the knowledge layer. Reading it will teach you the shape of incident response and what readiness looks like, but readiness itself is built by doing: drafting and keeping a real plan, holding a tabletop exercise, taking and testing an offline backup, and confirming the contact tree works are practised and signed off in person where supervision allows. By the end you will be able to explain why preparation decides the outcome of an incident; name the four phases of the NIST SP 800-61 lifecycle and map them onto the CSF functions; list the elements of an incident response plan, including the four core roles of lead, decision-maker, communicator, and recorder; describe the readiness items a small team must keep ready, above all clean offline backups and a current contact list; and explain why rehearsal through tabletop exercises turns a written plan into a real capability.

Key Terms

Incident: any event that harms, or threatens to harm, the confidentiality, integrity, or availability of our systems, accounts, or information. Preparation is what we do before any incident so that we handle the one that comes well.
Incident response plan (IRP): the agreed, written document, settled in advance, that says how we will respond to an incident: the roles, the contacts, the tools, the steps, and the thresholds for acting. It exists so that no one has to invent the response under pressure.
Incident lifecycle (NIST SP 800-61): the recognised four-phase model for handling an incident: Preparation, then Detection and Analysis, then Containment, Eradication, and Recovery, then Post-Incident Activity. It is the spine of this course.
NIST Cybersecurity Framework (CSF): the wider mental map of cyber risk, organised into the functions Govern, Identify, Protect, Detect, Respond, and Recover. The lifecycle phases map onto the last three of these, and the post-incident learning feeds back into the first three.
Incident response team (IRT): the appointed group that runs the response. In a small force this may be a handful of people, each holding a defined role, supported by the wider chain of command.
Tabletop exercise: a rehearsal in which the team talks through a realistic incident scenario step by step, without touching live systems, to test the plan, the roles, and the contacts before a real incident does.
Clean offline backup: a copy of important data kept disconnected from the live systems, so that an attacker or ransomware that reaches the live systems cannot reach the backup too. It is the foundation of recovery.
Contact list (call tree): the current, agreed list of who to reach and how, in and out of hours, including any external authority, so that the right people can be brought in fast when an incident starts.
Runbook (playbook): a short, step-by-step guide for handling one common type of incident, written in advance, so the team follows a tested method rather than improvising. Later lessons build several of these.
Access follows appointment: the standing rule that a person holds system access, including the elevated access used in a response, because they are appointed to a role, not merely because they hold a certificate.

Why preparation decides the outcome

It is natural to think of incident response as the dramatic part: the alarm, the scramble, the cool head containing the threat at midnight. That part matters, but it is not where most incidents are won or lost. They are won or lost beforehand, in whether the team that scrambles has a plan to follow, knows who does what, can reach the people it needs, has clean backups to restore from, and has done all of this once before in rehearsal. A team with those things in place handles a serious incident as a hard but ordered piece of work. A team without them meets the same incident as chaos: arguing about who is in charge, hunting for a phone number, discovering the backups were never tested, and making panicked mistakes that turn a contained problem into a disaster.

The reason is simple. An incident is the worst possible moment to be making first decisions. People are frightened and rushed, information is incomplete, the clock is running while harm spreads, and everyone is tired. Decisions made in that state, who leads, whether to isolate a host, whether we even have a good backup, are slow and often wrong. Every one of those decisions that can be made in advance, calmly, and written down is a decision that does not have to be made badly in the heat of the day. Preparation is, at bottom, the moving of decisions out of the panic and into the calm. That is why the recognised standard puts Preparation first.

In our setting the stakes are sharper than in an ordinary office. The Principality is a non-territorial, digitally organised state: it runs on self-hosted online services, a single sign-on identity service that ties accounts together, and per-user certificates and keys. There is no physical territory to fall back on if the digital fabric fails; the systems, the records, and the identities largely are the state's working substance. That means an incident can reach further and matter more here than elsewhere, and it means continuity, keeping essential functions running through disruption, is not a luxury but a duty. Preparation is how we earn the right to keep operating on a bad day.

The incident lifecycle: the spine of the course

The course is organised around a recognised model, the incident lifecycle set out in the NIST standard SP 800-61. It breaks the handling of any incident into four phases, in order, and almost everything you will learn fits into one of them. Hold the shape in your head now and the rest of the course will hang on it neatly.

The first phase is Preparation: everything done before an incident to be ready for it, which is the subject of this lesson. The second is Detection and Analysis: noticing that an incident is happening, working out what it is, how far it reaches, and how serious it is, and starting the record. The third is Containment, Eradication, and Recovery: stopping the spread, removing the threat and closing the hole it used, and restoring service from clean backups. The fourth is Post-Incident Activity: the blameless review afterwards that asks what happened, what worked, and what to fix, and feeds those lessons back so the same gap does not open again.

   THE INCIDENT LIFECYCLE  (NIST SP 800-61)  -  the four phases

   +----------------+     +----------------+     +-------------------+
   |       1        |     |       2        |     |         3         |
   |  PREPARATION   | --> |  DETECTION &   | --> |  CONTAINMENT,     |
   |                |     |   ANALYSIS     |     |  ERADICATION &    |
   |  plan, roles,  |     |  recognise,    |     |  RECOVERY         |
   |  contacts,     |     |  triage,       |     |  stop spread,     |
   |  tools, clean  |     |  scope,        |     |  remove threat,   |
   |  backups,      |     |  preserve      |     |  patch hole,      |
   |  rehearsals    |     |  evidence      |     |  restore clean    |
   +----------------+     +----------------+     +-------------------+
          ^                                                |
          |                                                v
          |              +-----------------------------------+
          |              |               4                   |
          +--------------|     POST-INCIDENT ACTIVITY        |
                         |  blameless review; update plan,   |
                         |  controls, playbooks; feed back   |
                         +-----------------------------------+

   The loop is the point: phase 4 feeds the lessons back into phase 1,
   so each incident leaves the next response better prepared.

Notice that the lifecycle is a loop, not a line. The fourth phase feeds straight back into the first: every incident, handled and reviewed, should leave the plan, the controls, and the runbooks better than it found them, so the team is more prepared for the next one. This is why preparation is never finished. It is not a document you write once and file; it is a standing capability you keep current, and the post-incident review is one of the main ways you keep it current. Each later lesson in this course takes one of these four phases and turns it into practical method, and they all return, in the end, to the readiness you build here.

How the lifecycle maps onto the CSF functions

You met the NIST Cybersecurity Framework in CIS 201 as the wider mental map of cyber risk, with its six functions: Govern, Identify, Protect, Detect, Respond, and Recover. The incident lifecycle is not a rival to that map; it is the part of it that deals with an incident in progress, drawn out in working detail. The two fit together, and seeing how is worth a moment, because it connects this specialist course back to the foundation every member shares.

The lifecycle's Detection and Analysis phase is the working detail of the framework's Detect function: spotting events and anomalies in good time, then understanding them. The Containment, Eradication, and Recovery phase splits across two functions: containing and eradicating the threat is Respond, acting on the incident to limit harm, while restoring service from clean backups is Recover, putting systems and data back. And the Post-Incident Activity phase closes the loop by feeding lessons back into the strategic functions, Govern, Identify, and Protect: updating policy and oversight, re-examining what assets and risks we have, and strengthening the safeguards that failed. Preparation itself draws on all of these calmer, before-the-day functions; it is where Govern, Identify, and Protect do their standing work so that Respond and Recover have something to work with.

   LIFECYCLE PHASE  (SP 800-61)        CSF FUNCTION  (the wider map)

   1. PREPARATION  ..................  GOVERN / IDENTIFY / PROTECT
        (plan, roles, backups,           strategy & oversight, know
         rehearsal, safeguards)          your assets, safeguards in place

   2. DETECTION & ANALYSIS  .........  DETECT
        (recognise, triage, scope)       spot events & anomalies in time

   3a. CONTAINMENT & ERADICATION  ...  RESPOND
        (stop spread, remove threat)     act on the incident

   3b. RECOVERY  ....................  RECOVER
        (restore from clean backups)     restore systems and data

   4. POST-INCIDENT ACTIVITY  .......  feeds back into
                                        GOVERN / IDENTIFY / PROTECT
        (blameless review, update)       so the same gap does not recur

   Same picture, two views: the lifecycle is the framework's
   incident-facing functions, drawn in working order.

The practical value of seeing both views is that it stops you thinking of incident response as a separate world from everyday security. The strong passphrases, the MFA, the patched devices, the controlled access, and the tested backups that CIS 201 and CIS 220 teach are the Protect work that decides how well your Respond and Recover go. A force with good everyday hygiene has fewer incidents and recovers from them faster, because preparation is mostly just good security kept current and pointed at the bad day.

The incident response plan and the four core roles

The centrepiece of preparation is the incident response plan: the written, agreed document that says how we will respond, settled while the systems are calm. Its whole purpose is to remove improvisation from the worst moment. A good plan for a small force does not need to be long, but it does need to be real, current, and known to the people who will use it. At a minimum it should set out the roles and who holds them, the contact list for reaching those people and any external authority, the thresholds for what counts as an incident and when to escalate, the tools and access the team will need, where the clean backups are and how to restore them, pointers to the runbooks for common incidents, and the rule that the response is led by appointment.

The heart of the plan is roles. In an incident, the single most common cause of early chaos is that no one is sure who is in charge or who decides, so people either freeze or all act at once. Naming the roles in advance fixes that. A small team can cover the essential roles among a few people, and one person may hold more than one in a minor incident, but four roles must always be covered. Who leads: one named incident lead who runs the response, holds the overall picture, and directs the team, so there is a single point of coordination rather than a committee. Who decides: the person with authority to make the consequential calls, to isolate a critical service, to declare a major incident, to involve an external authority, which may be the lead in a small event but is often someone more senior for the heavy decisions. Who communicates: one person who handles communications, keeping the chain of command and any affected nationals or authorities properly informed, so that messages are consistent and the responders are not interrupted. Who keeps the record: the recorder, who logs every action and observation with times, building the timeline that the analysis, the recovery, and the post-incident review all depend on. A response with no recorder loses the very history it most needs.

   INCIDENT RESPONSE: THE FOUR CORE ROLES  (always covered)

   +-----------------+   +-----------------+
   |   WHO LEADS     |   |   WHO DECIDES   |
   |  Incident lead  |   |  Authority for  |
   |  runs the       |   |  the big calls: |
   |  response, one  |   |  isolate, esca- |
   |  point of co-   |   |  late, involve  |
   |  ordination     |   |  an authority   |
   +-----------------+   +-----------------+
            |                     |
            +----------+----------+
                       |
   +-----------------+ | +-----------------+
   | WHO COMMUNICATES| | | WHO KEEPS THE   |
   |  keeps chain of | | | RECORD          |
   |  command & aff- | | | logs every act  |
   |  ected people   | | | & observation   |
   |  informed; one  | | | with TIMES;     |
   |  consistent     | | | builds the      |
   |  voice          | | | timeline        |
   +-----------------+ | +-----------------+
                       |
   A few people may cover all four in a small force, and one person
   may hold two roles in a minor incident, but every incident has a
   LEAD, a DECIDER, a COMMUNICATOR, and a RECORDER.

   Access to act follows APPOINTMENT to the role, not a certificate.

One rule sits over the whole plan: access follows appointment, not qualification. The elevated access used in a response, to disable accounts, restore backups, change keys, is held by the people appointed to those roles, not by anyone who simply holds a certificate or has read this course. Completing CIS 310 prepares you to be appointed; it does not by itself grant you access to anything. This is the same standing rule you met across the speciality, and it matters most in an incident, when the temptation to grab access and act is strongest and the discipline to act only within your appointed role is what keeps the response coordinated and lawful.

Readiness: contacts, tools, and clean offline backups

A plan names what the team will do. Readiness is having, in advance, the things the team will need to do it, so that the response is a matter of reaching for prepared resources rather than scrambling to assemble them. Three readiness items deserve particular attention for a small force.

The first is a current contact list, sometimes called a call tree. When an incident starts, the team must reach the right people fast: the incident lead, the decision-maker, the wider chain of command, the person who holds a particular system, and any external authority that must be told. The list must be current, must work in and out of hours, and must not live only inside the very systems that may be down, a contact list you can only open by logging into the service that has just failed is no contact list at all. Keep a usable offline copy. And like everything else in preparation, the list must be tested: a contact tree that has never been exercised is a list of guesses.

The second is the tools and access the team will need, identified and ready before the day. This is less about exotic equipment than about not being caught out: the team should know what it will use to analyse, contain, and recover, should have the access it needs arranged by appointment, and should have a clean, trusted device or two to work from rather than reaching for whatever is nearest, which in a malware incident may itself be compromised. Detail belongs to later lessons and to local instructions; the preparation principle is simply that you do not want to be installing or learning a tool for the first time in the middle of an incident.

The third, and the most important single item of readiness, is clean offline backups. Backups are the foundation of recovery, and recovery is how an incident ends well rather than catastrophically. You learned the shape of good backups in CIS 201 as the 3-2-1 rule: three copies, on two kinds of media, with one kept off-site or offline. The word that earns its place here is offline. Ransomware and capable attackers go looking for backups precisely because destroying them is how they force payment or make recovery impossible; a backup that is connected to the live systems can be reached and ruined along with everything else. A backup that is genuinely offline, disconnected, cannot be reached by something that has only reached the live network. And a backup is only real if it has been tested by actually restoring from it; an untested backup is a hope, not a plan. Clean, offline, tested backups are the single thing most likely to turn a ransomware attack from a disaster into an inconvenience.

   READINESS CHECKLIST  -  have these BEFORE the bad day

   PLAN
     [ ] a written incident response plan, current and known
     [ ] thresholds: what counts as an incident, when to escalate
     [ ] runbooks for the common incidents (built in later lessons)

   ROLES
     [ ] LEAD, DECIDER, COMMUNICATOR, RECORDER all named
     [ ] cover for each role (people are away, ill, unreachable)
     [ ] access arranged by APPOINTMENT, not by certificate

   CONTACTS
     [ ] current call tree, in AND out of hours
     [ ] a usable OFFLINE copy (not only inside the systems)
     [ ] external authority contacts where duty requires
     [ ] the contact tree has actually been TESTED

   TOOLS & ACCESS
     [ ] the tools to detect, contain, recover identified & ready
     [ ] a clean, trusted device to work from
     [ ] needed access arranged in advance, within appointment

   BACKUPS  (the most important item)
     [ ] 3 copies, on 2 media, 1 of them OFFLINE / off-site
     [ ] backups are genuinely DISCONNECTED from live systems
     [ ] restore has actually been TESTED, not just assumed

   REHEARSAL
     [ ] the team has run a TABLETOP exercise of the plan
     [ ] lessons from drills and real incidents fed back in

   A plan you have never opened under pressure is a draft.
   Readiness is the plan made real and kept current.

Rehearsal: the tabletop exercise

A plan that has never been rehearsed is a document, not a capability, and the gap between the two is wide. The way a small force closes it cheaply and safely is the tabletop exercise: the team sits down and talks through a realistic incident scenario, step by step, without touching any live system. Someone presents the situation, "an officer reports their account is sending messages they did not write," and the team works it as if it were real: who leads, who is told, what we check first, when we decide to disable the account, where the backups are, who talks to the affected national, what we record. The exercise costs an hour or two and a quiet room, and it pays that back many times over.

It pays back because rehearsal surfaces the gaps while they are still cheap to fix. Tabletops routinely reveal that a key contact has changed roles and the list is stale, that two people both assumed they were the lead, that nobody is sure whether the backups are actually offline, that the one person who knows how to restore a service is the same person who is always away. Every one of those is far better discovered in a calm rehearsal than at midnight in a real incident. Rehearsal also builds the thing a written plan cannot: familiarity. A team that has talked through an incident once moves faster and calmer through the real one, because the shape of the work is no longer strange. The roles feel natural, the steps are half-remembered, and the panic that comes from facing something wholly new is much reduced.

For these reasons, rehearsal is not an optional extra at the end of preparation; it is the test that tells you whether your preparation is real. A plan, a contact list, and a backup that have all been exercised together in a tabletop are a capability. The same three, written but never rehearsed, are a set of assumptions, and an incident is a poor place to discover which of your assumptions were wrong. Rehearse in peacetime so that the bad day finds you ready, and feed what each rehearsal teaches back into the plan, just as you will feed back the lessons from real incidents. That habit, of practising and improving before you are forced to, is the difference between a force that hopes it will cope and one that knows how it will.

In Practice: a quiet Tuesday and a tabletop

Corporal Idris, appointed to the small incident response team of his speciality, sets aside an hour on an ordinary Tuesday for a tabletop exercise. Nothing is wrong; that is the point. He gathers the three others who would form the response team and reads out a scenario he prepared: a systems assistant reports that one of the Principality's self-hosted services is refusing logins and that files on it appear to have been renamed with a strange extension. It looks like ransomware. He asks the team to work it as if it were happening now.

They talk it through. Who leads? Idris is named lead, and the others quickly find the first gap: two of them had each assumed they would be the decision-maker for the heavy call of taking the service offline, and they settle on the spot who actually holds that authority and write it into the plan. Who communicates, who records? They assign both, and the recorder starts a mock log with times, just as she would for real. Where are the clean backups, and are they offline? Here the second gap appears: nobody present can say for certain that the most recent backup is genuinely disconnected from the live systems rather than mounted to them, and they note it as an action to verify and test that week, because if that backup is reachable by the ransomware it may be worthless. Who do we tell? They walk the contact tree and find a third gap: one out-of-hours number belongs to someone who moved roles a month ago. They correct it before the hour is out.

No system was touched and no real incident occurred. Yet the exercise turned up three real problems, an unsettled decision-maker, a backup of unknown offline status, and a stale contact, each of which would have cost precious minutes or worse in a real ransomware event, and each now fixed in calm. Idris closes by folding the lessons back into the written plan and scheduling a follow-up to confirm the backup is offline and restores cleanly. When a real incident eventually comes, the team will meet it as something half-familiar, with the right roles, a current contact list, and a tested clean backup behind them. That readiness was not luck. It was an hour spent on a quiet Tuesday, which is exactly when preparation is done.

Check Your Understanding

Name the four phases of the NIST SP 800-61 incident lifecycle in order, and explain why the standard, and this course, place Preparation first. What does it mean to say the lifecycle is a loop rather than a line?
An incident response plan names four core roles that must always be covered. List them and say in one sentence what each does, and explain why naming them in advance prevents the most common cause of early chaos in a response.
Why must a backup intended for recovery be both offline and tested? Give one way a connected or untested backup can fail you on the bad day, and link your answer to the 3-2-1 rule from CIS 201.

Reflection (write a short paragraph): Think about a plan you have relied on that was written but never rehearsed, in any part of your life, and how it held up the first time you actually needed it. Knowing now that preparation is the phase that decides how every other phase of an incident goes, and that a tabletop exercise is how a written plan becomes a real capability, which single item of readiness, the plan, the roles, the contacts, the tools, or the clean offline backups, would you most want to have tested before your own bad day, and why?

Summary

Preparation is the phase that decides the outcome. Most of how a bad day goes is settled beforehand, in the plan, the roles, the contacts, the tools, the clean offline backups, and the rehearsal, because an incident is the worst moment to be making first decisions. Preparation is moving those decisions out of the panic and into the calm.
The course spine is the NIST SP 800-61 incident lifecycle: Preparation, then Detection and Analysis, then Containment, Eradication, and Recovery, then Post-Incident Activity. It is a loop, not a line: the post-incident review feeds lessons back into preparation, so each incident leaves the next response better prepared.
The lifecycle maps onto the CSF functions from CIS 201: Detection and Analysis is Detect; containment and eradication are Respond; recovery is Recover; the post-incident learning feeds back into Govern, Identify, and Protect. Everyday security hygiene is the Protect work that decides how well Respond and Recover go.
The incident response plan removes improvisation from the worst moment. Its heart is roles, and four must always be covered: who leads, who decides, who communicates, and who keeps the record. Access to act in a response follows appointment, not qualification.
Readiness means having the things the team will need before the day: a current, tested, offline-capable contact list; the tools and access arranged in advance; and above all clean, offline, tested backups, the single thing most likely to turn ransomware from a disaster into an inconvenience.
Rehearsal turns a written plan into a capability. A tabletop exercise costs an hour and surfaces the stale contact, the unsettled decision, and the backup of unknown status while they are still cheap to fix. Rehearse in peacetime and feed every lesson back into the plan.
Related study: this opens CIS 310, leading into Lesson 02 (Detection and Analysis), Lesson 03 (Containment, Eradication, and Recovery), Lesson 04 (Defensive Playbooks), Lesson 05 (Continuity and Disaster Recovery), Lesson 06 (Threat Intelligence and Knowing the Adversary), Lesson 07 (Evidence, Forensics, and Investigation), Lesson 08 (Crisis Communication, Notification, and Coordination), Lesson 09 (Resilience by Design), and Lesson 10 (After the Incident). It builds on CIS 201 (the CSF functions, the 3-2-1 backup rule, and the member's report that triggers all of this), CIS 220 (access and identity, "access follows appointment"), and works closely with HCR 220 (continuity and resilience), SIG 410 (resilient and off-grid communications), SIG 220 (communications discipline), and PME 210 (records and reporting that underpin the recorder's role).

Preparing for the Bad Day