Device Code Phishing Lives in the Log Table You Don’t Ingest

By AutoCypher · 6 weeks ago 12 Jun 2026

The phishing lure points at microsoft.com/devicelogin. The sign-in finishes on the real Microsoft login page. MFA fires, the user approves it, and the token that comes out the other end carries amr: ["mfa"] exactly like a clean interactive logon. Nothing about the authentication looks wrong, because in a narrow technical sense nothing about it was wrong. The user really did authenticate. Really did satisfy MFA. Really did it on Microsoft’s own infrastructure. The only problem is whose machine the resulting token landed on.

That’s device code phishing, and it is one of the few identity attacks that genuinely sidesteps the AiTM-proxy detection logic most shops spent 2024 building. There’s no evilginx-style reverse proxy to fingerprint, no lookalike domain to block, no TLS cert to flag. The verification URL is Microsoft’s. The defenders who wired up detections around proxied login pages and novel sign-in domains find this one walks straight past them.

The flow it abuses, briefly

RFC 8628, the OAuth 2.0 Device Authorization Grant, exists so input-constrained gear can authenticate without a browser. Think a smart TV, a Teams Room, an az login on a headless box. The device gets a short user code from the authorization server, tells you to go enter it at a verification URL on a real keyboard, and meanwhile polls the token endpoint until you’ve finished. Note that in the abuse case the poll comes from the attacker’s infrastructure, which is exactly why the polling IP matters later. When you finish, the token goes back to whatever started the flow.

The abuse is just: the attacker starts the flow from infrastructure they control, then convinces you to enter their code on the legitimate Microsoft page. You authenticate, the attacker’s poller collects the access and refresh tokens. No password ever touches their gear, and the MFA claim rides along in the token, so any control that treats “MFA satisfied” as proof of intent gets fooled. That mfa claim reflects the end user’s authentication, not any assessment of device trust or session risk — which is the gap the technique drives a truck through. That’s the whole trick.

Storm-2372 (a suspected Russia-nexus cluster, active since at least August 2024 per Microsoft’s February 2025 writeup) made this technique a household name. It didn’t stay APT-only. Proofpoint reported a sharp increase starting September 2025 spanning both state-aligned and criminal actors, driven by automated frameworks (SquarePhish2 and Graphish among them) that took it from bespoke red-team gimmick to commodity. The Cloud Security Alliance’s March 2026 research note put the EvilTokens campaign at 340+ Microsoft 365 organizations touched. So this is not a curiosity anymore.

Where the detection actually lives

Here’s the part most teams get wrong on the first pass: they query SigninLogs for AuthenticationProtocol == "deviceCode", see a handful of events, tune it, and call it covered. The interactive sign-in does land in SigninLogs. But a lot of the device-code telemetry, including the token redemption and the polling that betrays the attacker’s IP, shows up in AADNonInteractiveUserSignInLogs, which is a different beast entirely.

The field you actually want is originalTransferMethod == "deviceCodeFlow", and you want it across both tables:

union SigninLogs, AADNonInteractiveUserSignInLogs
| where OriginalTransferMethod == "deviceCodeFlow"
| where ClientAppUsed == "Mobile Apps and Desktop clients"
| project TimeGenerated, UserPrincipalName, AppDisplayName, ResourceDisplayName,
          IPAddress, Location, UserAgent, CorrelationId, SessionId, ResultType

OriginalTransferMethod is the better discriminator than AuthenticationProtocol here because it survives the split between the interactive auth and the non-interactive token activity, and it’s what Microsoft’s own tooling keys on. (The exact value can vary by tenant and schema version, so validate it by running a benign device-code sign-in and inspecting which tables and fields populate before you trust it in a rule.) Note also that the ClientAppUsed == "Mobile Apps and Desktop clients" filter narrows scope and can drop events from custom or headless tooling that presents differently — confirm against your own benign device-code sign-ins, and consider running without it during hunting. Pull AppDisplayName, ResourceDisplayName, IPAddress, UserAgent, CorrelationId, and SessionId every time. You will need all of them the moment you start correlating, and adding them after the fact means re-running queries against cold data. No single join key is perfect here: prefer SessionId to stitch the device-code event chain when it’s populated, then fall back to CorrelationId (which Microsoft notes is client-supplied and not guaranteed accurate), and failing that a time-bound join on user plus app/resource, kept to a tight 5–10 minute window to avoid false joins. SessionId isn’t consistently populated across both SigninLogs and AADNonInteractiveUserSignInLogs, so expect to lean on those fallbacks more often than not.

Now the operational catch. AADNonInteractiveUserSignInLogs is, in most tenants, the single largest Entra table by event volume; it dwarfs interactive sign-ins (a good chunk of that volume is ordinary non-interactive user activity — refresh-token redemption, cached sessions, background desktop/mobile client traffic — not device code, so don’t assume high counts mean a device-code problem; service-principal and managed-identity sign-ins live in their own tables, AADServicePrincipalSignInLogs and AADManagedIdentitySignInLogs). Plenty of Sentinel deployments quietly drop it from the ingest pipeline to keep the workspace bill down, and nobody writes that decision down anywhere the SOC will find it. It is also a separate export category in your diagnostic settings — if NonInteractiveUserSignInLogs isn’t explicitly selected in the Log Analytics export, the table simply isn’t there regardless of your retention budget. If your detection assumes that table is in the index and it isn’t, you have a detection that compiles, passes its unit test against a sample event, and never fires in production. Check both your diagnostic-settings categories and what you’re actually ingesting before you trust the rule. (This is the kind of gap that only surfaces during the incident, which is the worst possible time to learn it.)

The threshold depends entirely on your shop

There is no universal volume number for this one. Baseline your own tenant’s legitimate AppDisplayName distribution before you pick a threshold, because a dev-heavy tenant and a locked-down corporate one are not the same animal.

In a standard Windows corporate environment (managed laptops, browser SSO, no real developer tooling) legitimate device code flow is close to nonexistent. In that world you can alert on essentially any occurrence and live with it. The volume is single digits per month, most of it explainable, and the cost of investigating a false positive is a two-minute Slack message to the user.

In a tenant full of engineers, the picture inverts. az login --use-device-code, GitHub Codespaces, WSL sessions, CI runners, anything headless: these generate device-code sign-ins by the dozen or hundred daily, and they are completely benign. Alert on raw occurrence there and you’ve built an alert cannon pointed at your own SOC. The first tuning round is always the same exercise. Baseline the legitimate AppDisplayName and client IDs (Azure CLI 04b07795-8ddb-461a-bbee-02f9e1bf7b46, Azure PowerShell 1950a258-227b-4e31-a9cf-717495945fc2, Visual Studio, your Teams Rooms fleet — verify which your tenant actually sees), allowlist them, and stop alerting on volume.

Once you can’t alert on volume, you pivot to behavior. The signals that hold up:

A device code sign-in followed within minutes by a device registration from a different IP or country than the sign-in itself. The legitimate user’s IP shows on the auth; the attacker’s shows on the registration. Correlate them on CorrelationId (or a tight time window on user plus resource), and note that the device-registration event itself lives in AuditLogs — don’t hard-code Category == "DeviceRegistration"; key on the activity names (“Register device”, “Add device”, “Add registered owner to device”, “Add Windows Hello for Business credential”) and confirm the exact schema in your tenant — not in the sign-in tables — you’ll be joining across both, so time-bound the join (e.g. where TimeGenerated > ago(1h)) to keep it from timing out or blowing up cost. That geographic split across one correlated chain is hard to explain innocently — though VPN and NAT-multiplexed users can produce the same split, so score it rather than firing a binary alert.
The flow targeting the Microsoft Authentication Broker client ID (29d9ed98-a469-4536-ade2-f981bc1d605e) against the device registration / Intune enrollment resource. That combination is the documented path to registering an attacker device and then pulling a Primary Refresh Token, which is the difference between “stole a session” and “owns a persistent identity in your tenant.” Treat broker-client device code flow from an unmanaged context as high severity, not informational.
A Windows Hello for Business credential getting provisioned on a freshly registered device shortly after a device-code event. That’s the persistence payoff, and it satisfies your high-assurance Conditional Access policies going forward.

Don’t lean on the python-requests/2.25.1 user agent that floated around the early Storm-2372 reporting. It was a real indicator and it’s trivially changed — current tooling spoofs Chrome/Edge-on-Windows strings through headless browsers just as easily. Treat the user agent as a pivoting signal for correlation, never a load-bearing detection condition, or you’re detecting last year’s tooling defaults instead of the technique.

Where the false positives come from, and the other gaps

Beyond developer tooling, the recurring false-positive sources are conference-room and shared devices (Teams Rooms, Surface Hubs, Android-based room systems) that legitimately use device code flow during provisioning, and the occasional admin doing a one-off device login from a jump host. None of those should produce a device registration from a foreign IP, which is why the correlation approach beats the flat allowlist. These benign cases also tend to originate from known corporate IP ranges, which you can fold into the detection logic.

Two coverage realities worth stating plainly. Default Entra sign-in log retention in the portal is 30 days on P1/P2 (7 days on the free tier), not 90, so retroactive hunts past that window depend entirely on whether you’ve shipped the logs to Log Analytics or a SIEM with longer retention — and your Log Analytics retention is whatever your budget bought. And Conditional Access, your primary preventive control here, requires Entra ID P1 or P2. A tenant on the base license can detect this all day, and while it lacks granular Conditional Access, it isn’t without a prevention lever: enabling Security Defaults (free tier) blocks device code flow outright as a coarse, all-or-nothing control. Conditional Access is what buys you scoped exceptions.

Closing the door, and the part that gets skipped

The Conditional Access “authentication flows” policy can block device code flow outright, and that is the most direct control you have. Microsoft’s own recommendation is to block it wherever it isn’t needed. The caveat: a global block breaks every Teams Room and every developer’s az login --use-device-code simultaneously, so you scope it. Scope by users/groups, and by named locations where supported, leaving an exception only for the populations that genuinely need the flow. In most tenants that exception list is short and worth the afternoon it takes to build.

When you confirm a compromise, revoking the user’s refresh tokens via revokeSignInSessions is necessary but not sufficient, and this is the step that gets botched. Token revocation invalidates refresh tokens, but it does not retroactively kill access tokens the attacker already holds — those remain valid until they expire (typically up to ~60–90 minutes) — unless Continuous Access Evaluation is enabled for the resource, which can revoke them closer to real time — so factor that window into your containment and force a sign-out where you need an immediate cutoff. More importantly, it does not undo the device registration or any OAuth consent grant the attacker created along the way, and those survive a password reset cleanly. Note the distinction on device trust: a bare device registration (Entra-joined) is not the same as a device “marked as compliant,” which requires MDM/Intune enrollment — but if your Conditional Access grants trust to merely registered devices, the attacker’s registration may quietly hand them a path back in. They sit in the tenant as durable persistence, outside the account-review workflows that catch disabled users and stale guests. After any device-code incident, the closeout has to include enumerating and removing unrecognized device registrations (Entra admin center > Devices, or Graph GET /devices) and app consents (Enterprise Applications, or Graph GET /oauth2PermissionGrants), not just rotating credentials and closing the ticket.

Control mapping

Activity	800-53
Restricting device code flow via Conditional Access	CM-6, CM-7
Token / session revocation on compromise	AC-12
Authentication strength, MFA-intent gap, broker-client abuse	IA-2, IA-5
Sign-in log monitoring and correlation	AU-6, AU-12, SI-4
Continuous evaluation of token/access conditions	CA-7
User awareness against the social-engineering front end	AT-2

For SOC and detection-engineering readers, this maps cleanly to MITRE ATT&CK as phishing (T1566) feeding application access token theft (T1528).

The honest summary: this attack defeats the controls that assume a successful, MFA-satisfied authentication is the same thing as an authorized one. Your detection has to question that assumption, your logging has to include the table you were tempted to drop for cost, and your incident closeout has to look past the account at the device and consent artifacts the attacker left wired into the tenant.