§ Trackr.Live

CP — Contingency Planning

Contingency Planning is the availability family. CP exists for the day the system is down and someone has to bring it back: a ransomware event that encrypts the primary array, a datacenter that loses power and cooling, a region that drops off the network, a migration that corrupts the database. CP governs how you planned for that day, how you trained for it, whether you ever tested the plan, and where the data and the processing capacity live when the primary site is gone. It is not incident response. IR handles the detection, triage, and containment of the event itself; CP handles getting the mission back online after the dust settles. The two touch (CP-2 coordinates with the incident response plan, and the recovery side of an incident is where they hand off), but an SSP that describes CP as “responding to security incidents” has the wrong family, and an assessor who knows the catalog will say so.

Schematic of a primary processing site failing over to an alternate site, with a backup data stream feeding the alternate while a recovery-time clock counts in the foreground.

CP is a control family from SP 800-53, not a phase of the RMF. The RMF is the SP 800-37 process: Prepare, Categorize, Select, Implement, Assess, Authorize, Monitor. CP controls get pulled in at Select based on your impact level, implemented at Implement, and graded at Assess, with anything that fails landing in the POA&M and riding into continuous monitoring. What makes CP different from most families is that its central artifact, the contingency plan, is a deliverable an assessor reads end to end (the bulk of SP 800-34 Rev 1 exists to tell you how to write it). Most families you assess by sampling configs. CP you assess partly by reading a document and partly by asking the brutal question: has anyone ever actually run it.

What’s in the family

In Rev 5 the CP family spans CP-1 through CP-13. Not all of that range is live. CP-5, Contingency Plan Update, is withdrawn, folded into CP-2 where keeping the plan current now lives as part of the base control. If your SSP cites CP-5 as a standalone control with its own implementation statement, that is a Rev 4 artifact someone forgot to retire, the same smell as an SSP still calling AC-11 “Session Lock.” The Rev 4 title for CP-1 (“Contingency Planning Policy and Procedures”) was also shortened in Rev 5 to just “Policy and Procedures,” matching every other -1 control.

The controls that carry the weight:

CP-1, Policy and Procedures. The org-level policy and the procedures that operationalize it. Mandatory, inherited by everything else in the family, and the first thing an assessor pulls.
CP-2, Contingency Plan. The center of gravity. Identifies the essential mission and business functions, sets recovery objectives, assigns roles, and lays out activation, recovery, and reconstitution phases. The enhancements do real work: CP-2(1) coordinate with related plans (IR, COOP, disaster recovery, BCP), CP-2(3) resume essential functions within a defined window, CP-2(8) identify critical assets that support those functions. CP-2(8) is the one people skip, and it is the one that makes the rest of the plan executable instead of aspirational.
CP-3, Contingency Training. Train the people with recovery roles on what those roles are. Ties to AT, but CP-3 is role-specific: the on-call who has never opened the runbook is the failure this control is meant to prevent.
CP-4, Contingency Plan Testing. Test the plan and document the results. CP-4(1) coordinates the test with related-plan tests; CP-4(2) tests recovery at the alternate processing site, which is where tabletops stop being enough (more below).
CP-6, Alternate Storage Site. Off-site storage for backups and recovery material, separated from the primary so the same event can’t take both.
CP-7, Alternate Processing Site. Somewhere to run the workload when the primary is gone. Cold, warm, or hot, and which one you owe depends on your recovery objectives and your impact level.
CP-8, Telecommunications Services. Alternate comms so the alternate site can actually be reached and used. Easy to forget until the failover works and nobody can route to it.
CP-9, System Backup. The backups themselves, with CP-9(1) being the enhancement that says test them for reliability and integrity. CP-9(1) is the difference between a backup and a hope.
CP-10, System Recovery and Reconstitution. Getting the system back to a known, secure state after the disruption, then returning it to normal operations.
CP-11, CP-12, CP-13. Alternate communications protocols, safe mode, and alternative security mechanisms. None of these are in any 800-53B baseline, even at High; they come in only through an overlay or a specialized resilience requirement, so you will see them rarely.

Baselines and the impact level that drives them

The baselines live in SP 800-53B, not in the catalog. FIPS 199 categorizes the system, FIPS 200 sets the floor, and 800-53B turns the resulting impact level into a starting control set. For CP the load-bearing dimension is the availability impact. A High-availability system carries alternate-site and telecom obligations that a Low-availability system simply does not.

In rough terms: CP-1, CP-2, CP-3, CP-4, CP-9, and CP-10 are allocated from Low and apply to almost everything. The expensive controls, CP-6 alternate storage, CP-7 alternate processing, CP-8 telecom, and the heavier CP-2 and CP-4 enhancements, bind at Moderate and tighten at High. And it is not just on/off. The site type escalates with the level: a Moderate system might satisfy CP-7 with a warm site and a recovery window of a day or two, where a High-availability system is looking at a hot site and a window measured in hours, because the 800-53B High parameters and your own RTO leave no room for slower. Tailoring CP-7 down to “we’ll rebuild from backups in the cloud when it happens” is defensible at Low and a finding at High.

Deeper: RTO and RPO, and why the BIA outranks the plan.
Two numbers drive the whole family, and CP-2 is where they’re supposed to be written down. The Recovery Time Objective is how long the function can be down before the impact is unacceptable. The Recovery Point Objective is how much data you can afford to lose, measured backward from the moment of failure, which in practice means how stale your most recent usable backup is allowed to be. A four-hour RPO means hourly backups won’t cut it on a bad day and you’re into replication or snapshots. These numbers are not supposed to be invented by whoever wrote the SSP. They come from the Business Impact Analysis, which SP 800-34 puts upstream of the plan and which RA-9 (criticality analysis) and PM-11 (mission/business process definition) feed. Here’s the contestable part: I’d wager most CP-2 plans understate RTO and RPO relative to what the BIA actually requires, because the honest numbers would force a CP-7 hot site and a CP-9 backup cadence nobody budgeted for. The plan gets written to match the infrastructure that exists, not the recovery the mission needs. An assessor who pulls the BIA and compares its stated tolerances against the plan’s stated objectives will find that gap, and it is a real finding, not a paperwork nit.

Control	Typical first live at	What an assessor actually checks
CP-2	Low	The plan exists, names essential functions, and states RTO/RPO that trace back to a BIA. Generic plan with no system-specific functions = finding.
CP-3	Low	Training records for people with recovery roles, dated within the required cycle.
CP-4	Low	A test report from the last cycle, with after-action items and evidence they were closed.
CP-6	Moderate	Backups actually leave the building, to a site far enough that one event can’t hit both.
CP-7	Moderate	The alternate site exists, is contracted or owned, and its capacity matches the workload, not a fraction of it.
CP-8	Moderate	Alternate telecom is provisioned and the SLA matches the RTO.
CP-9	Low	Backups run on schedule, and CP-9(1): they’ve been restore-tested, with the restore documented.
CP-10	Low	A reconstitution procedure that returns the system to a secure, patched state, not just “powered on.”

Treat the “first live at” column as directional. Your overlay (FedRAMP, CNSSI 1253 for national-security systems, DoDI 8510.01) moves things, and the availability impact is the lever that decides how hard CP-6/7/8 bind.

Where it actually goes wrong

The untested backup. This is CP’s signature failure, and it is the easiest finding an assessor will ever write. CP-9 says back up the system. The backup job runs nightly, the dashboard is green, the retention is correct, and everyone assumes that means recovery works. Then the day comes, and the restore fails, because the backup captured the application but not the database transaction logs, or the encryption key for the backup volume lived only on the box that’s now encrypted, or the tape rotation quietly stopped writing eight months ago and nobody watched the verify step. CP-9(1), testing backups for reliability and integrity, exists precisely because a backup you have never restored is not a control, it is a belief. If I had to keep one CP enhancement and throw out the rest, it’d be CP-9(1). An assessor checking CP-9 should ask for the restore log, not the backup log, and the gap between “we back up nightly” and “we last successfully restored on [date]” is where this control lives or dies.

Tabletop theater under CP-4. A tabletop is a meeting. People talk through the plan, someone takes notes, an after-action report gets filed, and CP-4 is marked satisfied. Tabletops are useful for shaking out who-calls-whom and finding gaps in the runbook, and at Low impact a tabletop may be all the control asks for. But it proves nothing about whether CP-6 and CP-7 actually work, because nobody touched the alternate site. CP-4(2) forces testing recovery at the alternate processing site, and that is the test that matters; it binds at High, where the alternate-site obligations are heaviest. A tabletop tells you the plan is internally consistent; only a real failover tells you the plan is true. The first time anyone fails over to the warm site for real, something is broken (a firewall rule that was never replicated, a DNS TTL that pins traffic to the dead site for an hour, a license that’s host-locked to the primary). Find that during a scheduled test, not during the outage.

Alternate sites that are alternate on paper. CP-7 says have an alternate processing site. The contract exists, the line item is in the budget, and the capacity is a quarter of production because someone sized it for “critical functions only” and then the definition of critical quietly expanded. Or the alternate is in the same flood plain, the same power grid, the same metro fiber path as the primary, so the regional event that takes the primary takes the alternate too. CP-6 has the same problem on the storage side. Separation is the whole point, and “off-site” three miles away on the same substation is not separation.

Cross-family ties

CP doesn’t stand alone, and assessors read it against the families it depends on. CP-9 backups are data at rest and in motion, which puts them in MP (media protection) for handling and SC-28 for protection at rest, SC-8 in transit. A backup that restores perfectly and was sitting unencrypted on a share is a different finding in a different family, but it’s the same backup. CP-7’s alternate processing site inherits a pile of PE (physical and environmental) controls at the new location, and if the alternate is a cloud region, the AC-20 inheritance boundary question comes right back. CP-2’s essential-functions analysis is fed by RA-9 criticality analysis and the BIA, with PM-11 above both at the program level. CP-4 testing is assessment activity, so it ties to CA-2 and the CA-7 continuous monitoring strategy. CP-3 is the contingency-specific slice of AT. And the seam with IR-4 is where the recovery phase of an incident hands off to the machinery CP built. Map those ties in the SSP, because CP described as a self-contained island reads as dependencies nobody thought through.

Artifacts

CP implementations land in the usual three places, plus one the other families don’t have. The SSP carries the control narratives. The SAR is the assessor’s verdict. The POA&M holds what failed. The extra artifact is the contingency plan itself, a real document with named functions, named people, RTO/RPO, activation criteria, and recovery procedures, written to SP 800-34’s structure. The fastest way to fail a CP assessment is a contingency plan that could belong to any system: a template with the system name swapped in, generic recovery objectives, and a test section that says “annual tabletop” with no report attached. They’ve read SP 800-34. They want to know what happens to this system when it’s down, and whether anyone has ever proven the answer by bringing it back.

Sources

Adjacent material on this site

IR, Incident Response (where the event is handled before recovery hands off to CP)
MP, Media Protection (where CP-9 backups get handled and protected)
SC, System and Communications Protection (SC-28/SC-8 protection of backups at rest and in transit)
RA, Risk Assessment (RA-9 criticality analysis feeding the BIA and CP-2)
PE, Physical and Environmental Protection (the alternate-site controls CP-7 inherits)
CA, Security Assessment and Authorization (where CP-4 testing meets continuous monitoring)
RMF control families overview
RMF roadmap