CVE-2026-31431 “Copy Fail”: A 732-Byte Path to Root on Every Linux Distro Since 2017
Theori disclosed CVE-2026-31431 (“Copy Fail”) on 2026-04-29. The bug is a logic flaw in the Linux kernel’s algif_aead socket interface that lets an unprivileged local user perform a deterministic, attacker-controlled 4-byte write into the page cache of any readable file. The PoC is 732 bytes of Python 3.10+ standard library, no external dependencies, no offset table, no race window. It runs unchanged on Ubuntu 24.04, RHEL 10.1, SUSE 16, and Amazon Linux 2023, and it returns a root shell in under a second on every one of them. CVSS is 7.8. That number is wrong, and the multi-tenancy implications are the reason.
The bug was found by Theori researcher Taeyang Lee with help from the company’s Xint Code static analysis tool — about an hour of scanning, per the disclosure. That detail is worth sitting with. The vulnerable code has been in tree since August 2017 (commit 72548b093ee3) and survived nine years of human review, distro hardening, syzkaller campaigns, and at least two academic AEAD audits. The mainline fix is a664bf3d603d, which reverts the original optimization.
The defect
authencesn is the kernel template for IPsec’s RFC 4543 ESP-with-Extended-Sequence-Number authenticated encryption. It wraps an AEAD construction (in the PoC, authencesn(hmac(sha256),cbc(aes))) and is responsible for stitching the 64-bit ESN into the HMAC computation. To do that, it has to rearrange the AAD bytes — the high 32 bits of the sequence number have to be shuffled into a specific position before the HMAC chain runs. The template performs that rearrangement using the caller’s destination buffer as scratch space, writing 4 bytes at offset assoclen + cryptlen via scatterwalk_map_and_copy(). That write lives just past the legitimate output region.
Under the original out-of-place AEAD model, the destination buffer was a kernel-allocated page or a userspace RX buffer. A scratch write past the output boundary landed in slack the caller already owned. Harmless.
In 2017, algif_aead got an in-place optimization. To skip copying large ciphertexts twice, the code chained the request’s source and destination scatterlists so the AEAD transform could decrypt directly over the input pages. Critically, when the user submitted ciphertext via splice(), those input pages were page-cache pages — kernel-cached copies of arbitrary readable files on disk. The chained scatterlist now had page-cache pages sitting in the writable destination region. The authencesn scratch write, which had been a harmless 4-byte spill into the caller’s own buffer, was suddenly a 4-byte write into the host page cache.
The page is never marked dirty. The on-disk file is unchanged. AIDE, Tripwire, dm-verity at rest, and ordinary file checksums miss it entirely. Every subsequent read or execve() of the file pulls from the corrupted page cache until the page is evicted.
The exploit primitive
The PoC chains the following sequence. Setup runs once; steps 4–6 repeat per 4-byte write at an attacker-chosen page offset of an attacker-chosen file:
socket(AF_ALG, SOCK_SEQPACKET, 0)andbind()toauthencesn(hmac(sha256),cbc(aes)).setsockopt(ALG_SET_KEY, ...)with any key — correctness of the crypto operation does not matter.accept()returns a request socket.sendmsg()withMSG_MOREcarries an 8-byte AAD,ALG_SET_OP=DECRYPT,ALG_SET_IV, andALG_SET_AEAD_ASSOCLEN=8. Bytes 4–7 of the AAD are the four bytes you want written.os.splice()(Python 3.10+) feeds 32 bytes from the target file’s open fd into the AF_ALG socket. The kernel grabs page-cache references — no copy, no read syscall against the file as the unprivileged user.recv()triggers the decrypt. The HMAC verification fails andrecv()returnsEBADMSG. The scratch write happens anyway, before the integrity check unwinds.
That is the whole primitive: a 4-byte write at assoclen + cryptlen into the spliced page-cache page. The PoC walks the target binary in 4-byte steps, patching whatever shellcode or instruction sequence the operator wants resident in the cached /usr/bin/su (or /etc/passwd in the variant that flips a UID field).
It is straight-line code. There is no race, no spray, no SLUB grooming, no kASLR leak. It works the same on every kernel between 72548b093ee3 (August 2017) and a664bf3d603d (April 2026), and it works the same regardless of the AEAD instance the kernel selects, because the bug is in the template wrapper, not the cipher.
Why the same script runs everywhere
LPE chains usually break across distros because they depend on slab layout, struct offsets, KASLR slide, or symbol availability. Copy Fail depends on none of those. The exploit only touches:
- A standard Linux syscall ABI (
AF_ALGhas been stable since 3.6). - The
authencesntemplate name string, identical across kernels. - The page-cache page underlying any setuid binary the attacker can
open()for read.
The setuid binary it edits — usually su, optionally sudo, passwd, chsh, or whatever the operator points it at — varies by distro, but every mainstream distro ships at least one mode-04755 ELF the unprivileged user can read. Once the cached page contains attacker shellcode, the attacker execve()s the binary and the kernel runs the patched code with euid=0. Reboot evicts the page; the root shell already opened survives.
The result: one 732-byte file, SHA256 a567d09b15f6e4440e70c9f2aa8edec8ed59f53301952df05c719aa3911687f9, drops on Ubuntu 24.04, RHEL 10.1, SUSE 16, and Amazon Linux 2023 with no edits. C and Go ports are already on GitHub.
Where the CVSS is wrong
7.8 reflects “local AV, low complexity, low privs, full CIA at the host.” That scoring vector silently assumes “local” means a user already inside your trust boundary. In 2026 that assumption is broken in three places:
- Shared-kernel containers. The page cache is host-wide. A write from a container affects the host’s cached copy of the host’s
/usr/bin/su, and therefore every other container backed by the same node.runc,containerd, vanilla EKS/GKE/AKS node pools, and self-managed Kubernetes are all in scope. - CI runners. Any system that executes untrusted PR code on a real Linux kernel — self-hosted GitHub runners, GitLab shared runners, Jenkins agents, Buildkite workers — gives an attacker the “local user” the CVSS vector requires.
- Multi-tenant SaaS. Anything that runs customer-supplied code on a shared kernel: notebook hosts, function-as-a-service platforms that don’t use microVMs, dev sandbox products, “compute” tiers of analytics platforms.
What is not affected, because the kernel is not shared with untrusted code: AWS Lambda and Fargate (Firecracker microVMs per tenant), Cloudflare Workers (V8 isolates), and gVisor sandboxes (userspace kernel, no algif_aead). Anything else with a shared host kernel and untrusted local execution should be treated as critical, not high.
What to do this week
Patching is the only durable fix.
- Update kernels to a build that includes
a664bf3d603d. Distro tracking: 7.0 mainline, 6.19.12, 6.18.22, plus stable backports. Major distros (RHEL, Ubuntu, SUSE, Amazon, Alma, Rocky) are shipping fixed packages now. - If you cannot reboot today, blocklist the module:
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.confandrmmod algif_aead. Verify nothing on the host actually uses userspace AEAD (most do not — kernel IPsec, dm-crypt, fscrypt, and TLS offload do not go throughAF_ALG). - For container hosts, push a seccomp profile that denies
socket(AF_ALG, ...)for workload pods. Docker’s default profile already blocks it; verify your Kubernetes runtime class is honoring that and not running withseccompProfile: Unconfined. FixseccompProfile: RuntimeDefaultas a baseline if you have not already. - For long-lived hosts where the kernel cannot be patched immediately (appliances, embedded, vendor-locked systems), set
CONFIG_CRYPTO_USER_API_AEAD=nat next rebuild and disable the module in the meantime.
The control mapping is unflattering. CM-6 baseline configuration should already have disabled algif_aead on hosts with no legitimate userspace crypto consumer — Docker did it years ago for a reason. SI-2 flaw remediation timelines should already have you under a 14-day SLA for actively exploited LPEs in the kernel; if your SLA is “next quarterly patch window,” this is the bug that proves the SLA is wrong. SC-39 process isolation and SC-2 application partitioning are the controls that quietly fail under shared-kernel multi-tenancy — the page cache is the shared resource the threat model forgets. AC-6(2) least privilege for non-security functions does not help here, because the attacker starts as the unprivileged user the control assumes; the relevant compensating control is SC-7(21) boundary protection between tenants, which on a shared kernel host effectively requires microVMs or a userspace kernel.
Detection
The exploit is not subtle if you are looking. Falco/Sysdig and Elastic both shipped rules within 48 hours; Florian Roth’s signature-base carries a YARA rule for the public PoC strings (authencesn(hmac(sha256),cbc(aes)), the status messages [+] /etc/passwd page cache mutated, and the g.open("/usr/bin/su",0) fragment).
Behavioral signal worth wiring into your AU-6 continuous monitoring pipeline:
- Any process outside the disk-encryption toolchain (
cryptsetup,systemd-cryptsetup,iwd,wpa_supplicant) creating anAF_ALGSOCK_SEQPACKETsocket. On most production hosts this is essentially zero traffic. sendmsg()to anAF_ALGsocket immediately followed bysplice()from a setuid binary’s fd into the same socket.execve()of a setuid binary within seconds of the above sequence by the same uid.- UID transitions to 0 from a process whose parent never invoked a SUID-aware path.
YARA on disk will not help — the payload is short and easy to retype, and the page-cache write leaves no on-disk artifact. Runtime telemetry on syscall sequences is the detection that actually pays.
What this tells us about the next year
Two things are worth noticing beyond the CVE itself.
First, the bug existed for nine years and was found by an AI-assisted code scanner in roughly an hour. The same pattern showed up in CVE-2026-3854 (GitHub X-Stat, IDA MCP), and it is going to keep showing up. Long-tail logic bugs in mature C code — the ones that don’t trip sanitizers and don’t have obvious crash signatures — are now economically findable by small teams. Defenders should assume the same scanners are pointed at their own code.
Second, “shared-kernel multi-tenancy” remains the persistently weakest layer in modern cloud architecture, and the industry keeps pretending namespaces are an isolation boundary. They are not, and Copy Fail is the cleanest demonstration of that since Dirty Pipe. If your threat model puts untrusted code on a shared kernel, your isolation primitive is the kernel — every kernel LPE is a tenant escape, and the cadence of kernel LPEs is not slowing down. Plan the migration to per-tenant kernels (Firecracker, Kata, gVisor, Cloud Hypervisor) on the assumption that the next Copy Fail is already in tree, just not yet found.