§ CM

Drift detection on image-mode RHEL: what bootc actually changes for defenders

Red Hat shipped image mode as a technology preview with RHEL 9.4, took it to general availability in 9.6 in late 2025, and it’s the default story Red Hat is telling for RHEL 10. In 2026 you are going to see it in production, possibly under your control and possibly because an application team decided that bootc was easier than learning Ansible. The pitch is clean: build your host like a container, ship it like a container, roll it back like a container. The defender consequence is messier. The places persistence lives shift, the artifacts your detection content was built against move or disappear, and the operational shortcuts your sysadmins reach for to get around immutability are the same shortcuts an attacker would reach for first.

This post is about what actually changes when the fleet moves to image mode, where the detection content has to move with it, and what the first few weeks of tuning are going to look like.

What bootc actually gives you

Underneath the marketing, image-mode RHEL is the old rpm-ostree machinery wearing a container-native jacket. The OS is an ostree commit. /usr is a read-only bind mount from that commit. /etc is a writable overlay with a three-way merge applied on every upgrade. /var is persisted across upgrades and is not part of the image at all. Updates come from an OCI registry — bootc switch quay.io/yourorg/rhel-bootc:prod flips the deployment, and on next boot you are running the new commit. The previous deployment stays pinned as a rollback target.

The defender-relevant invariants from that:

  • /usr and /boot content is verifiable against the ostree commit hash. Anything on disk under those paths that is not in the commit is by definition drift.
  • /etc is supposed to drift — that is the whole point of it — but every change has provenance through the three-way merge logs in the journal.
  • /var is the unmanaged zone. If you do not have a story for it, it is the soft underbelly.

That third bullet is where most teams get into trouble first. More on that below.

Persistence moves up the stack

If you have spent any time writing detections for T1543 (Create or Modify System Process) on EL hosts, your muscle memory is to watch /etc/systemd/system/, /usr/lib/systemd/system/, /etc/cron.d/, the usual cron spool, and the SSH authorized_keys files for root and service accounts. Most of that muscle memory is still useful — /etc is still writable and systemd still reads from it — but on an image-mode host, writing to /usr/lib/systemd/system/ either fails or requires an operator to explicitly punch a hole through immutability with rpm-ostree usroverlay. That command is the single most important thing to alert on. It is not subtle. It logs to the journal with the unit rpm-ostreed.service and produces an audit-visible remount of /usr from read-only to read-write.

The persistence techniques that survive the move are the ones that target /etc and /var. User-level systemd units under /etc/systemd/user/ and /var/lib// still work fine. PAM module drops under /etc/pam.d/ work. Anything that hooks the container runtime — and on image-mode hosts there is almost always a container runtime, because that is the whole workload model — works. If your detection library was heavy on /usr/local/bin drops and light on container-layer persistence, image mode is going to hurt.

The other thing that shifts: a determined operator can rebuild the image with their persistence baked in and push it to the registry the host pulls from. Now the persistence is signed, reproducible, and survives bootc rollback. The control surface you actually need to defend is the build pipeline and the registry, not the host. This is a supply chain problem wearing a host-hardening costume (SR family, not just CM).

What the detection actually looks like

Assume Splunk with the Linux auditd TA, because that is what most shops have. The events you want are:

  • execve of /usr/bin/rpm-ostree with argv containing usroverlay, override, or install
  • execve of /usr/bin/bootc with argv containing switch, edit, usr-overlay, or status --json from an interactive session
  • path events under /usr/ or /boot/ with a write mode, regardless of process
  • systemd-journald entries from rpm-ostreed.service containing Created new deployment outside of the maintenance window

In SPL, the first one is roughly:

index=linux sourcetype=linux:audit type=EXECVE
| search exe="/usr/bin/rpm-ostree" OR exe="/usr/bin/bootc"
| eval argv=mvjoin(a*, " ")
| search argv="*usroverlay*" OR argv="*override*" OR argv="*switch*"

Expected volume on a 500-host fleet, once tuned: low single digits per day. Before tuning: hundreds per day, almost entirely from bootc-fetch-apply-updates.service running its scheduled check (which calls bootc status and sometimes bootc upgrade --check) and from monitoring agents that shell out to bootc status --json to populate inventory. Those two sources are the entire first round of tuning. Exclude them by parent unit (_SYSTEMD_UNIT=bootc-fetch-apply-updates.service) and by parent process for the inventory case, not by argv — argv exclusions will get bypassed the first time someone tweaks the agent.

A caveat that bit me on paper and will bite you in practice: auditd’s EXECVE records split argv across multiple a0, a1, a2… fields, and on long argument lists the record gets truncated. If your bootc switch command line includes a long registry path with a digest, the digest can land past the truncation boundary. You will see the verb but not the target. Splunk’s linux:audit sourcetype does not warn you about this; the field is just quietly missing. If you care about the target image — and for this detection you do, because the registry it points at is the actual indicator — pair the auditd event with the corresponding rpm-ostreed.service journal entry, which logs the full target as a single string.

The three-way merge problem in /etc

When bootc applies an update, files in /etc go through a three-way merge: the previous image’s /etc, the new image’s /etc, and the live /etc are reconciled. Local changes are preserved. This is the right behavior for operability and the wrong behavior for a defender who wants the image to be authoritative.

Concretely: if an attacker writes to /etc/pam.d/sshd on a running host, the next image upgrade does not overwrite it. The merge sees a local change and keeps it. Your CM story is not as strong as the immutability marketing suggests. The mitigation is to detect divergence between the running /etc and the image’s /etc directly — ostree admin config-diff will give you the list — and to run that as a scheduled job whose output goes to the SIEM, not to a log file on the host where the attacker can edit it.

This is the kind of control that looks elegant in a slide and tedious in production. The diff is noisy on first deployment because every shop has legitimate local config — kdump tuning, NTP overrides, the one weird sysctl your DBA insisted on in 2022 — and you have to either bake those into the image (correct) or maintain an allowlist (pragmatic). Most teams will start with an allowlist and migrate to baked-in config over the following two quarters. That migration is the work; the detection is the easy part.

/var is where the bodies are buried

/var is not managed by the image at all. Container storage lives there. Most application state lives there. Logs live there. If you treat image mode as a sufficient hardening story and ignore /var, you have hardened the part of the host an attacker was least interested in touching anyway.

For /var, the controls are the controls you already have, or should: file integrity monitoring on the directories that matter (/var/lib/containers/storage/overlay/, /var/lib/kubelet/, anything under /var/lib// that holds binaries or scripts), SELinux in enforcing mode with the right contexts, and either a read-only bind mount or noexec on subdirectories that should never hold executables. The last one is the one that breaks things — noexec on /var/tmp is fine, noexec on /var/lib/containers will break Podman in ways the error messages will not explain. Test it on a canary, not in prod.

Mapping to 800-53

For your SSP, the controls that get strengthened by image mode are CM-2 (baseline configuration), CM-6 (configuration settings), CM-7 (least functionality, because the image only contains what you put in it), and SI-7 (software, firmware, and information integrity — ostree commits are content-addressed and signable, which is a genuine SI-7 improvement over RPM signature verification at install time). The controls that get harder, not easier, are CM-3 (configuration change control — because every change is a registry push, and your CCB workflow needs to live in the pipeline), CA-7 (continuous monitoring — your monitoring has to follow the build, not the host), and SR-4 (provenance — you now have a container supply chain to defend, with all the SBOM and signing obligations that entails). AU-2 and AU-12 do not change in concept, but the event sources move: more of your audit-relevant events are now in rpm-ostreed.service and the registry, fewer in dnf history.

If the assessor asks how you detect unauthorized changes to the host baseline, the honest answer for image mode is two-part: ostree commit verification proves /usr and /boot are intact, and a scheduled ostree admin config-diff plus FIM on selected /var paths covers the rest. Anyone who tells you image mode by itself satisfies SI-7 has not read past the data sheet.

What to actually do this quarter

If you have image-mode hosts already, the first detection to write is the rpm-ostree usroverlay alert. It is high-signal, low-volume, and the cases where it fires legitimately (an operator break-glass) are exactly the cases you want a ticket on anyway. The second is a scheduled ostree admin config-diff shipped to the SIEM with an allowlist that you commit to shrinking. The third is FIM on the /var paths that hold executable content for whatever workload runs on the host. Everything else can wait until you have those three working and tuned.

If you do not have image-mode hosts yet but expect to: the work that pays off now is in the build pipeline. Signing, SBOM generation, registry access controls, and a CCB workflow that treats an image push as a change. The host-side detection content is the easier half of this problem. The supply-chain side is the half that determines whether image mode is a net security win or an expensive lateral move.

Sources