The HTTP/2 Bomb: one wire byte, gigabytes of RAM, and the composition an AI spotted before we did

A single client on a 100 Mbps residential line can hold 32 GB of an Envoy or Apache process hostage in roughly the time it takes to read this paragraph. No botnet. No reflection. No giant pipe. One TCP connection, valid HTTP/2 frames the whole way through, and a server that allocates memory it will never get a chance to free. That asymmetry — call it a one-byte ingress that costs the server kilobytes of live heap, repeated tens of thousands of times and then pinned in place — is the entire attack. It is tracked as CVE-2026-49975, the researchers at Calif (calif.io) named it the HTTP/2 Bomb, and the part that should bother you more than the amplification ratio is that the default configuration of the five most common things that terminate HTTP/2 on the public internet were all vulnerable on disclosure day.

It is a denial-of-service bug. Availability only. It does not run code, it does not read your data, and it should not be filed next to the separate Apache HTTP/2 flaw that reporting tied to RCE. The damage here is an outage and the cost of riding it out. That is the whole threat model, and it is bad enough.

What it actually is

Nothing in this attack is new. That is the uncomfortable part.

The first half is the HPACK indexed-reference bomb. HTTP/2 compresses headers with HPACK (RFC 7541), which keeps a per-connection dynamic table: you insert a header once, then refer back to it with a compact index. Each indexed reference is one byte on the wire. The server, though, has to materialize that header — allocate fresh copies of the name and value — every single time you reference it. Seed the table with one entry, fire thousands of single-byte references back at it in a single request, and the server’s allocator does all the work while you barely touch your uplink. Cory Benfield coined “HPACK Bomb” back in 2016 as CVE-2016-6581, and Gal Bar Nahum hit roughly 4,000x against Apache httpd again last year as CVE-2025-53020. This is well-trodden ground.

The twist Calif describes is where the amplification comes from. The classic bomb stuffs a large value into the table, so implementations learned to cap the total decoded header size and called it handled. This variant goes the other direction: the header is nearly empty. The cost isn’t the bytes you decode — it’s the server’s per-entry bookkeeping, the pool blocks and struct overhead it allocates around each reference. The decoded-size limit never fires because there’s almost nothing to decode. Per Calif’s own nginx writeup, a nearly-empty dynamic-table entry referenced thousands of times costs on the order of 70 bytes of pool memory per reference once you count the pool-block overhead, which lands the measured ratio around 70:1. Same primitive, different blind spot.

The second half is the HTTP/2 window stall, which is just Slowloris wearing a new protocol. In HTTP/2 the client controls the flow-control window for the server’s responses. Advertise a zero-byte initial window (INITIAL_WINDOW_SIZE=0) and the server can never send its response DATA frames, so it can never finish the request and free what it allocated. Then drip a 1-byte WINDOW_UPDATE in every so often to reset the send timeout. The connection stays alive, the allocations stay live, and you hold them for as long as the server’s idle timeout lets you — which on a default config is a while. The lineage here is CVE-2016-8740 and CVE-2016-1546, also a decade old.

Each of those was bounded on its own. A 70:1 amplifier is harmless if the memory gets freed when the request completes. The window stall on its own just ties up a connection slot. Chain them and the stall pins the amplified allocations in place until the box runs out of RAM. As Calif put it, ratio is only half the equation; the other half is how long you get to hold it.

The cookie-crumb bypass that makes it land

Here’s the detail that turns a known-bad pattern into a working bypass. RFC 9113 §8.2.3 explicitly lets you split a Cookie header into individual fields — “crumbs.” The servers that did enforce a header-field-count limit (Apache, Envoy) weren’t counting cookie crumbs against it. So you send a pile of empty cookie crumbs, multiply your per-entry allocations, and walk straight under the size-based caps.

The per-server behavior is where the ratios diverge. Envoy appends each crumb into a growing buffer, so allocator overhead compounds and Calif measured roughly 5,700:1 on a single stream. Apache rebuilds the full merged cookie string on every crumb and leaves the older copies live in memory until stream cleanup, which gets to ~4,000:1 even with an empty cookie value. nginx and IIS sit in the ~70:1 class because they don’t do that merge dance.

Blast radius

These are the researchers’ measured demo figures, per server and version. Treat them as demo conditions, not a guarantee for your build.

Server (tested version) Wire→RAM amp Demo impact
Envoy 1.37.2 ~5,700:1 ~32 GB in ~10 s
Apache httpd 2.4.67 ~4,000:1 ~32 GB in ~18 s
nginx 1.29.7 ~70:1 ~32 GB in ~45 s
Microsoft IIS (Windows Server 2025) ~68:1 ~64 GB in ~45 s

Cloudflare’s Pingora is confirmed affected and lands in the lower-amplification tier alongside nginx and IIS. The disclosure didn’t publish a precise Pingora ratio, so I won’t invent one — call it the ~70:1 class, still remotely exhaustible.

Don’t let the spread fool you into thinking the low-amplification servers are safe. 70:1 from a home connection still takes the box down; the high-amp proxies just fall faster. This is not a “you need a big pipe” attack. It’s one cheap connection against a default config.

And the high-amp tier being proxies is the genuinely nasty bit. Envoy and Pingora front other services. Take down the edge proxy and you’ve taken down everything behind it, regardless of how healthy the origins are. Per Calif’s Shodan sweep, 880,000+ public-facing servers running default configs of these five and supporting HTTP/2 were exposed — though, as they note, a good chunk sit behind a CDN that’s much harder to knock over. Inventory gaps are the norm at that scale; the embedded Envoy sidecar nobody remembers deploying is exactly the kind of thing that doesn’t show up in your asset list (RA-5, find your real HTTP/2-terminating surface, including the proxies you forgot are proxies).

How an AI agent found a ten-year-old composition

The discovery story is the on-brand hook, and it’s accurate without overclaiming. An OpenAI Codex agent, under the direction of Calif researchers (Quang Luong, with Jun Rong and Duc Phan validating against the other platforms), read multiple server codebases, recognized that the compression bomb and the Slowloris hold compose, and assembled the chained attack. Both halves were public for a decade. As far as anyone can tell, no human had put them together against these specific servers.

That’s the interesting framing — not “AI replaces vuln researchers,” which is nonsense, but “an agent did the boring cross-codebase pattern-matching that humans don’t, because reading five HTTP/2 stacks at once to check whether two old bugs still compose is exactly the kind of unglamorous work nobody schedules.” Calif’s own note that the spec (RFC 7541 §7.3) warned about memory exhaustion and still five independent implementations shipped the same class of bug points at the spec, not the implementers. When everyone reads the same security-considerations section and ships the same hole, the defect is upstream.

The flip side: the fix commits for nginx and Apache are public and disclose the vectors directly. Calif is explicit that any capable model can turn those diffs into a working exploit — that’s literally how they confirmed IIS, Envoy, and Pingora were vulnerable too. A PoC is reportedly out. The commit-to-exploit window is short now. Plan accordingly.

Severity, honestly

The NVD record for CVE-2026-49975 is reserved as of this writing, so there’s no official CVSS to quote and I’m not going to manufacture one. Reason about it instead: remote, unauthenticated, low-complexity, no user interaction, repeatable, cheap, and effective against default configs. Impact is availability only. For any internet-facing HTTP/2 endpoint that’s a high-severity DoS, and the proxy-takes-everything-down property pushes it up further for edge tiers. It does not breach data and it does not run code. Both things are true at once.

The patch matrix

  • nginx — fixed in 1.29.8, which imports the max_headers directive (default ceiling 1,000 headers per request, borrowed from freenginx). Upgrade to 1.29.8+. If you can’t, http2 off; is the stopgap.
  • Apache httpd — fixed in mod_http2 v2.0.41, which finally counts Cookie headers against LimitRequestFields. Stefan Eissing committed it the same day it was disclosed to Apache. As of disclosure it shipped as the standalone mod_http2 module and in trunk, not yet folded into a stable 2.4.x. So don’t sit waiting on a distro 2.4.x bump — pull the standalone module, or set Protocols http/1.1 to drop HTTP/2 in the meantime. Note: LimitRequestFields alone on an older build doesn’t save you, because that’s the limit the crumb bypass walks past.
  • Microsoft IIS — no fix at disclosure. Microsoft notified.
  • Envoy — no fix at disclosure; Calif’s June 3 update says Envoy released patches that appear to mitigate it, with validation ongoing. Confirm the current Envoy advisory before you trust that — “appears to mitigate” is not “confirmed clean.”
  • Cloudflare Pingora — no fix at disclosure; Cloudflare notified. If you’re on Cloudflare’s edge it’s likely mitigated centrally; the exposure is a self-hosted Pingora build you operate yourself.

Three of five named targets had no fix on disclosure day. That, not the 5,700:1 number, is the part that should set your week.

What to actually do

The durable fix is two limits enforced independently, because this bug exists in the gap between them. A maximum decoded header size and a maximum header count. Size alone is what the nearly-empty-header variant strolls past. The count has to include cookie crumbs, independent of total size — that’s precisely what the nginx and Apache fixes do.

Then kill the other half: bound the lifetime of a stalled stream regardless of WINDOW_UPDATE activity. A stream making no forward progress should time out even if the client keeps nudging the window. That’s what de-fangs the “pin it in memory” trick, and it’s the half most stopgaps forget.

For blast-radius containment while you sort patches, cap per-worker memory hard — cgroups, a container memory limit, ulimit -v. A worker process rarely needs gigabytes. Letting the kernel OOM-kill and respawn a bombed worker early is a far better failure mode than letting one connection drag the whole host into swap at 95%. This is containment (SC-5(2) resource quotas), not a fix; the attacker can still degrade you, they just can’t take the machine.

Stopgaps while unpatched: rate-limit and connection-cap at a WAF or L7 load balancer, restrict HTTP/2 concurrency, and as a genuine last resort disable HTTP/2 on exposed endpoints that don’t need it. That last one isn’t free — you lose multiplexing and eat the HTTP/1.1 head-of-line and latency cost — so don’t reach for it on a high-fanout API gateway without understanding what you’re trading.

Now the detection reality check, because this is where lab confidence dies. The attack traffic is entirely valid HPACK-encoded headers, and the wire overhead is close to 1:1 — the attacker’s ingress bytes barely move, so a volume-based flood threshold has nothing to fire on. Your flood detection will not see it. The signal is amplification, not volume — tiny ingress bytes producing huge RSS. Watch for per-connection or per-worker memory climbing while response progress sits near zero, HTTP/2 connections holding open with almost no DATA frames going out, and OOM-killer events on the web tier. The honest caveat: a memory-exhaustion DoS looks exactly like a memory leak or a traffic spike right up until someone correlates the RSS curve against near-zero egress. Your first instinct at 0300 will be “bad deploy, restart it,” and the restart buys you a few minutes before it fills again. Expect the on-call to chase a phantom leak for an hour before the shape of it clicks.

Map it however your assessor wants — SC-5 is the spine, SI-2 for the patch cadence, CM-6/CM-7 because the defaults were the vulnerability, SI-10 for counting crumbs as protocol-layer input validation, CP-2/CP-10 for edge-tier failover. The control story is clean. The operational story is that you have a remotely triggerable outage on infrastructure you probably can’t fully patch this week, found because an AI agent did the cross-reading nobody had bothered to do in ten years.

Patch nginx and Apache today. Cap worker memory on everything else and watch the RSS curves.

Sources