Token theft DFIR in Entra ID after the device-bound credential rollout

By AutoCypher · 7 weeks ago 24 May 2026

By mid-2026 most shops running Entra ID have either enabled Token Protection for sign-in sessions, started piloting Device Bound Session Credentials in Edge and Chrome, or both. The marketing has been confident: stolen refresh tokens become useless on a different device, browser cookies get cryptographically pinned, the AitM phishing kits stop working. The reality on the IR side is more uneven. Token theft incidents have not stopped — they have shifted, and the telemetry you need to investigate them has shifted with them. If your runbook still reads like 2023 (“check unfamiliar sign-in, revoke sessions, reset password, done”) you are going to miss the long tail.

This is a defender-side analysis of where token theft DFIR actually sits now: what survives device binding, what the detection looks like in Sentinel or Splunk with the M365 add-on, and where the first round of tuning will eat your week.

What device binding actually changed, and what it didn’t

Token Protection (the Conditional Access control, not to be confused with the broader CAE story) binds the refresh token and the session ticket to a cryptographic key held in the TPM of the device that did the original interactive sign-in. If an attacker pulls a refresh token out of a token cache or an OAuth flow and replays it from their own infrastructure, the token service rejects it because the proof-of-possession signature can’t be produced. Good. That kills a real class of attack — the Storm-style infostealer pipeline where a single token dump from one box yields a working session somewhere else.

What it does not kill:

On-host token use. If the attacker has code execution on the user’s device — which, frankly, is the case in most of the incidents that get escalated past tier 1 — the TPM signs whatever the malicious process asks it to sign. Token Protection is a network-perimeter control dressed up as a cryptographic one. The TPM is happy to cooperate with malware running as the user.
Sessions that predate the policy. Tokens issued before Token Protection was enforced are not retroactively bound. Refresh tokens last up to 90 days by default. Do the math.
Service principals and workload identities. None of the device-binding work applies. Client secret and certificate theft from a build agent or a developer’s .azure directory is the same problem it was in 2022, except attackers got better at finding them.
Cross-tenant access where the resource tenant doesn’t enforce the same policy. Guest accounts, B2B, multi-tenant apps. The policy is only as strong as the weakest tenant in the auth chain.
Federated sign-in paths where a downstream IdP issues the assertion. SAML token forgery (the Golden SAML pattern, still alive and well) bypasses the whole conversation.

So the 2026 token theft incident is increasingly one of three shapes: an endpoint-resident stealer that signs from the victim’s TPM, a service principal credential compromise, or a cross-tenant abuse pattern. The detections for those three look very different and the runbooks are not interchangeable.

What the detection actually looks like

The single highest-signal field for on-host token abuse after Token Protection is AuthenticationProcessingDetails in SigninLogs and AADNonInteractiveUserSignInLogs. Inside it, the key-value pair you want is Is Token Binding Used (or, depending on which schema version your tenant got the update on, Token Protection Status). When that field is Passed but the session also shows an IPAddress that is geographically far from the device’s normal egress, or a UserAgent mismatch against the registered device’s typical client string, you have a credible signal of on-host abuse — the TPM is signing, but something else is driving the requests.

A rough Sentinel-flavored shape (illustrative, not drop-in):

AADNonInteractiveUserSignInLogs
| where TimeGenerated > ago(24h)
| extend tpStatus = tostring(parse_json(AuthenticationProcessingDetails)
    ["Token Protection Status"])
| where tpStatus == "Passed"
| summarize ips=make_set(IPAddress), uas=make_set(UserAgent),
    asns=make_set(NetworkLocationDetails) by UserPrincipalName, SessionId
| where array_length(ips) > 1 or array_length(asns) > 1

Volume reality check: in a tenant of around 5,000 users you should expect this to return somewhere in the low hundreds of rows per day before tuning. Most of those are not incidents. They are:

VPN flips during the workday (split-tunnel exits via different egress IPs).
Mobile clients on cellular handing off to wifi mid-session.
The Outlook mobile app, which has a long and storied history of presenting user-agent strings that look nothing like the same app from yesterday.
Conditional Access break-glass paths where someone hit a compliant-device check, failed, re-enrolled, and re-authed from a second IP within the session window.

First-round tuning has to do two things. One, exclude your corporate egress ASNs and any sanctioned VPN provider ASN — not by CIDR, which will drift, but by autonomous system number from the NetworkLocationDetails blob. Two, suppress sessions where both IPs resolve to the same country and the time delta between them is over six hours; those are almost always the laptop-to-phone handoff, not an attacker. After that you should be down to a few dozen rows a day in a mid-size tenant, and the ones that remain are worth a human eye.

The field that gets ignored and shouldn’t is OriginalRequestId. When a stealer replays a token from on-host, the request ID chain often diverges from the device’s normal pattern in a way that’s invisible in the GUI but obvious if you correlate OriginalRequestId to the issuing CorrelationId of the parent interactive sign-in. That’s tedious to query and Microsoft does not document it well — the field semantics changed between the 2024 and current schema revisions and the published docs still describe the old behavior in at least one place. Worth verifying against your own tenant before building a detection on it.

Service principal compromise has its own telemetry, and most teams underweight it

For workload identity abuse, the table you want is AADServicePrincipalSignInLogs, not SigninLogs. The detection that earns its keep is any successful sign-in for a service principal from an IP that is not in your known automation egress set, combined with a ResourceDisplayName the SP has not touched in the last 30 days. That second clause is what makes it usable; without it you drown in CI noise.

The failure mode here is operational: a lot of shops never built an inventory of which SPs are supposed to call which resources, so the “has not touched in 30 days” filter has nothing to anchor against. If that’s you, the prerequisite work is a 30-day baseline of AADServicePrincipalSignInLogs grouped by ServicePrincipalId and ResourceDisplayName, dumped to a lookup table. Boring. Unavoidable.

One side note that bites people: the SP sign-in logs do not always populate IPAddress for managed identities calling Azure-internal resources. The field will be empty or 127.0.0.1. That is expected, not an indicator. If your detection alerts on “empty IP for service principal,” you have a noisy rule and you should kill it.

Remediation — what actually works

The instinct on token theft is to hit the “revoke sessions” button in the Entra portal and call it done. That revokes refresh tokens. It does not revoke the access tokens already in flight, which remain valid for up to an hour by default unless Continuous Access Evaluation is enabled and the resource provider supports CAE and the client supports CAE. In 2026 that covers Exchange Online, SharePoint, Teams, and Graph reasonably well. It does not uniformly cover third-party SaaS apps federated through Entra. So for a Graph-scoped compromise, revoke-plus-CAE is fast. For a compromise touching, say, a federated HR system, you have a window measured in tens of minutes where the attacker still has a valid access token even after you pushed the button.

The correct sequence, in order:

Revoke refresh tokens for the user (Revoke-MgUserSignInSession or the portal equivalent).
If a service principal is involved, rotate the secret or certificate and revoke existing tokens via the SP object — and confirm the rotation actually propagated to whatever vault the consuming workload reads from, because the workload will keep using the cached old credential until it’s evicted.
Force a password reset only if you have evidence the password itself was captured. Resetting the password without revoking tokens does nothing for an attacker holding a live refresh token; this is the single most common runbook error.
If Token Protection was bypassed via on-host abuse, the device is compromised. Isolate it. Re-imaging is the only honest remediation; TPM-bound credentials on a still-infected host will keep working from the attacker’s perspective the moment the user signs back in.
Hunt laterally on AuditLogs for any app consent grants, mailbox rule additions, or Conditional Access policy modifications made by the affected identity in the incident window. Mailbox rules in particular — the “forward to external, then delete” pattern is still depressingly common and survives session revocation because the rule itself persists server-side.

Step 4 is where the disagreement happens. Some teams will argue that since Token Protection bound the tokens to the TPM and the TPM is intact, you can rotate credentials and leave the device. I don’t buy it. If the attacker had code execution sufficient to drive token requests through the TPM, they had code execution sufficient to install whatever else they wanted. Re-imaging is not optional.

Environment assumptions that change the answer

A flat single-tenant commercial environment with everyone on managed Windows 11 endpoints and Edge as the default browser is the easy case. The detections above work, and Token Protection coverage is close to complete.

It gets messier fast:

Hybrid join with on-prem AD as the source of truth. PRTs issued via the cloud AP plugin behave differently than pure-cloud PRTs, and the sign-in log fields that indicate device binding can show Not Applicable for legitimate hybrid scenarios. Don’t alert on that as a binding failure.
GCC High and DoD tenants. Schema lag is real. Some of the fields described above landed in commercial in 2025 and have not fully populated in GCC High as of this writing. Verify in your own tenant before building rules.
BYOD with conditional access app control reverse-proxying sessions. The proxy terminates and re-issues sessions, which mangles the device-binding signal in ways the docs don’t fully explain. Treat detections in that path as advisory, not authoritative.
Tenants with heavy B2B guest usage. Guest sign-ins originate from the home tenant’s IdP. Your detections see the resulting token issuance but not the upstream authentication path. You are blind to half the kill chain by design.

Where this lands in 800-53

The relevant control families, without belaboring them: IA-2 and IA-5 for the authenticator and credential handling pieces, AC-12 for session termination (which is where the revoke-versus-CAE distinction actually lives in policy terms), AU-2 and AU-6 for the sign-in log capture and review, SI-4 for the monitoring side of the detection rules, and IR-4 for the response workflow. SR-3 deserves a mention for the service principal case, because workload identity credentials embedded in third-party integrations are a supply chain exposure whether your SR program acknowledges it or not.

Closing

Token Protection and DBSC are real improvements. They are not the end of token theft DFIR — they are a reshaping of it. The incident is now more likely to be on-host, more likely to involve a workload identity, and harder to investigate without correlating fields that Microsoft documents inconsistently. The teams that handle the next round of these incidents well are the ones who have already done the unglamorous baselining work on service principal behavior and who have written down, explicitly, the difference between revoking a session and ending one. The teams that haven’t will spend the first six hours of their next incident learning it.