Token replay forensics: reconstructing a post-MFA compromise in Entra ID

By AutoCypher · 7 weeks ago 03 Jun 2026

The incident that ruins your week in 2026 doesn’t start with a brute-force spike or a flood of failed MFA prompts. It starts with one clean, successful sign-in that satisfied every conditional access policy you have, from an IP you’ve never seen, replaying a token your IdP issued an hour earlier to a legitimate user who did everything right. MFA held. The phish still worked. The attacker didn’t beat your second factor — they stole the artifact your IdP hands out after the second factor succeeds, and they replayed it.

That’s the shape of an AiTM-derived session hijack, and it’s the post-authentication compromise pattern that DFIR teams keep landing on when the timeline gets reconstructed. The forensic problem isn’t “who failed to authenticate.” It’s “this session was authenticated, by the real user, and then it kept being authenticated from somewhere the real user never was.” Your detection logic that’s tuned for credential stuffing is blind to it by design.

What the attacker actually has

Strip away the phishing-kit branding and the mechanism is mundane. The victim hits a reverse-proxy page that relays their traffic to the real Microsoft login. They authenticate for real. They satisfy MFA for real.

Entra issues browser session material and downstream tokens after MFA succeeds, and the proxy captures the bearer material exposed through that browser flow. On joined devices, PRT behavior is a separate device-bound path and should not be treated as the normal AiTM capture artifact.

From that point the attacker holds bearer material that’s already past the MFA gate. They don’t need the password again. They don’t need to re-prompt. They import the cookie and they’re you.

The TTL is the whole game. A refresh token can be valid for days; an access token for an hour but renewable against that refresh token. Conditional access re-evaluates on policy triggers, not continuously, unless you’ve got Continuous Access Evaluation actually doing something. So the practical window between “token stolen” and “token dies on its own” is long enough to enumerate the mailbox, set an inbox rule, register a new authenticator, and pivot — all of which generate their own audit trail, which is the good news for the investigator.

The bad news is that none of it looks like an attack in the sign-in log. It looks like the user.

The artifact that gives it away

When you pull the timeline, the field that does the work is the session identifier and the device it’s bound to. In SigninLogs you’re looking at the correlation between SessionId, the DeviceDetail block (deviceId, trustType, operatingSystem), and the unique token identifier in AuthenticationDetails. A replayed session shows the same SessionId lighting up from a device fingerprint and IP that don’t match the originating authentication.

A rough Sentinel/KQL starting point — and I mean starting point, you will tune this hard:

SigninLogs
| where TimeGenerated > ago(7d)
| where ResultType == 0
| extend dev = tostring(DeviceDetail.deviceId)
| summarize
    ips = dcount(IPAddress),
    asns = dcount(AutonomousSystemNumber),
    devs = dcount(dev),
    cities = make_set(LocationDetails.city, 10)
    by UserPrincipalName, SessionId
| where ips > 1 and devs > 1

What you’re hunting is a single SessionId that fans out across more than one device fingerprint and more than one network. A real user’s session moving from corporate wifi to LTE will change IP — that’s normal, that’s the first wave of false positives — but the deviceId stays constant because it’s the same laptop. When the device identifier changes mid-session, something replayed that token onto hardware that wasn’t there at issuance.

The other tell is AuthenticationProtocol and the absence of a fresh interactive MFA event on the second leg. The replay rides in on the existing session, so you’ll see the session token honored without a corresponding MfaDetail entry showing a new challenge. CAE-enabled tenants surface this faster because the token gets re-evaluated against IP location changes, but CAE coverage is uneven across application types and I wouldn’t assume it caught everything.

Where this floods the SOC

Here’s the part the vendor blog skips. Run that query against a tenant of any size and the first thing you get back is noise, and most of it has nothing to do with token theft.

CGNAT and carrier IP churn. Mobile users behind carrier-grade NAT change public IP constantly, sometimes mid-session, sometimes across two ASNs if they roam between carrier pops. dcount(IPAddress) > 1 per session will fire on a chunk of your mobile fleet every single day. The deviceId constraint is what saves you — keep it, don’t drop it to “simplify” the rule.

VPN egress and split tunnel. Anyone whose traffic exits through a corporate VPN concentrator and then drops to local breakout for O365 (because someone read the Microsoft “don’t tunnel O365” guidance and half-implemented it) will show two ASNs per session routinely. That’s an architecture artifact, not an intrusion.

Shared service accounts and kiosk logins. A UserPrincipalName that’s actually a shared mailbox or a kiosk identity will look like it’s being replayed across the planet because it is being used across the building. Carve these out by group membership before you alert, not after.

In a mid-size tenant, expect this rule to throw somewhere in the low hundreds of hits a day before tuning and drop to single digits once you’ve excluded carrier ASN transitions within the same deviceId and whitelisted your VPN egress ranges. The threshold that matters isn’t a number you set once — it’s the join condition. ips > 1 alone is garbage. ips > 1 AND devs > 1 AND asn-change-not-explained-by-known-egress is a real signal.

If you’re on Splunk instead of Sentinel, you’re ingesting the same data via the Microsoft 365 add-on pulling the Graph activity feed, and the gotcha is the nested JSON. The DeviceDetail and AuthenticationDetails objects come through as serialized fields that the add-on doesn’t always flatten cleanly, so your spath extractions break silently when Microsoft tweaks the schema (which they do, without telling anyone). Check that deviceId is actually populated in your index before you trust a rule that depends on it — on a non-trivial fraction of sign-ins it’s empty, and an empty-vs-empty comparison reads as “same device” and eats your detection.

Retention is the forensic constraint nobody budgets for

The token outlives the log if you’re not careful. Default Entra sign-in log retention in the portal is 30 days unless you’ve got a P2 license bumping it, and your real history lives in whatever you’ve exported to Sentinel or Splunk. Refresh tokens can be valid for the better part of a week; the slow-burn intrusions where the attacker sits quiet before acting can push the interesting part of the timeline right up against your hot retention boundary.

If your SigninLogs are in 90-day hot storage and the initial phish was 95 days ago, the originating authentication — the one event that proves the session was hijacked rather than just used oddly — is in cold storage or gone. You can still reconstruct the post-compromise activity from AuditLogs (the inbox rule, the authenticator registration, the consent grant), but you’ve lost the anchor. Budget hot retention for sign-in and audit logs around the longest plausible refresh token lifetime plus your mean detection lag, not around whatever the default tier costs. This is squarely an AU-11 retention decision, and it’s one where the compliance-minimum number and the forensically-useful number are not the same number.

Remediation that actually kills the session

Password reset alone is not a complete containment action. Explicitly revoke refresh tokens/sign-in sessions, terminate application sessions where possible, and audit for persistence. This trips people up constantly. The refresh token is independent of the password; rotating the credential closes the front door while the attacker is already inside holding a valid session. You have to explicitly revoke the sessions — Revoke-MgUserSignInSession against the user, or the portal equivalent — and then force re-authentication. Then check what the session was used to establish, because a competent operator’s first move with a hijacked session is to plant persistence that survives session revocation: a registered authenticator (new IA-2 factor under attacker control), an OAuth app consent grant, an inbox forwarding rule.

The sequence that matters: revoke sessions, then audit AuditLogs for Add registered owner to device, Consent to application, and Update user MFA-method events in the compromise window. Revoking the session without pulling the attacker-registered authenticator just means they re-authenticate cleanly the next day, MFA and all, and you’re back where you started wondering why MFA didn’t help. Again.

Control mapping

What you’re doing	800-53 control
Session token binding, revocation on anomaly	AC-12, IA-2(12)
Sign-in / audit log capture and field integrity	AU-2, AU-3, AU-12
Retention sized to token lifetime, not compliance floor	AU-11
Session-replay detection and correlation	SI-4, AU-6
Phishing-resistant authenticators (the actual fix)	IA-2(1), IA-2(2)
Incident timeline reconstruction and eradication	IR-4, IR-5

The SI-4 detection buys you time and evidence. It does not fix the problem. The structural fix is phishing-resistant auth — FIDO2 / passkeys bound to the device, where the credential can’t be relayed through a reverse proxy because the origin check fails at the authenticator. Where you’ve deployed that, the AiTM proxy breaks at the authentication step and there’s no session to steal. Where you’re still on push-notification MFA or TOTP, the detection above is your compensating control, and it’s a lagging one.

The honest read: detection here is for the population you haven’t migrated to phishing-resistant factors yet, and for the long tail of legacy app auth flows that can’t do FIDO2. Build the SI-4 rule, tune it against your carrier and VPN reality, set retention to the token’s clock and not the auditor’s. But the migration to credentials that can’t be relayed is the work that actually retires the threat. Everything else is reconstructing the timeline after the fact.