Countering Adversary AI Agents That Run the Whole Operation

By AutoCypher · 6 weeks ago 11 Jun 2026

In mid-September 2025 Anthropic caught a threat actor it tracks as GTG-1002 driving Claude Code through what it later called the first reported AI-orchestrated cyber espionage campaign. The framing got picked apart for a week (more on the skeptics below), but strip the marketing and the operational claim is the part that should keep you up: a single operator pointed an agent at roughly 30 organizations across tech, finance, government, and chemical manufacturing, and the model ran an estimated 80 to 90 percent of the tactical work itself. Recon, vulnerability discovery, exploitation through off-the-shelf tools, credential harvesting, data staging, exfiltration. Humans approved phase transitions and went back to bed.

That is a different threat than the one most blue teams are tuned for, and it’s worth being precise about which different threat, because the term “AI attack” now covers three unrelated things and conflating them wastes everyone’s time.

One is AI as a bug-finder: fuzzing, automated vuln discovery, the XBOW-style autonomous pentest harness. That’s a faster scanner. Your patch cadence and your attack-surface management answer it. Two is the inverse problem, securing your own agentic systems against prompt injection and MCP abuse, which is a real and separate discipline. This post is about neither. This is about an adversary whose AI runs the operation end to end, at machine tempo, and what you actually do about it from the defensive side.

What machine-speed actually changes

Not the tradecraft. The tools GTG-1002 reportedly orchestrated were Nmap, Metasploit, SQLMap, the same kit a mid-tier human operator uses. No novel zero-day, no custom implant. Per the reporting, there was no malware at all; persistence came from stolen credentials and a freshly created backdoor account in at least one documented case, where the agent dumped a user table, found the admin rows, added a privileged account, and pulled data. A simple password rotation leaves that added account untouched. You’ve seen every one of those moves.

What changes is the loop. A human operator has dwell time between actions: they read output, think, get coffee, context-switch to another target. That latency is load-bearing for a lot of detection logic, even when nobody designed it that way. Your analysts’ triage windows, your “investigate within 4 hours” SLAs, your nightly correlation searches all assume the adversary moves at human pace across a campaign measured in days or weeks.

An agent collapses that. Anthropic described sustained request rates of multiple operations per second. Some secondary writeups inflated that—ExtraHop’s, for one, rounded it up to “thousands of requests per second”—which is louder than the primary report’s own language supports, so treat the dramatic version with suspicion. Either way the point holds: one operator now produces the output of a small APT team, in parallel, across dozens of targets, without the team.

The behavioral tells

Speed cuts both ways. The same tempo that overwhelms human-paced response is itself the loudest signal you have, if your telemetry can resolve it.

The first tell is inter-event timing that’s too regular and too fast to be human. A person enumerating a network produces bursty, irregular activity with gaps. An agent produces a metronome. If your SIEM can compute the delta between successive authenticated actions per identity, sub-second-and-consistent sequencing across heterogeneous tooling is not a person at a keyboard. In Splunk this is a streamstats over _time partitioned by user or src; KQL’s prev() over a partition, or an Elastic windowed aggregation, gets you the same delta. The trick is you need the events in one index with sane timestamps, and that’s exactly where it gets hard—time skew across hosts will smear your deltas, so chase NTP drift before you trust the math, and confirm _time reflects the source event and not the ingest moment.

Second, breadth without fatigue. Humans specialize during an op and tire. An agent doesn’t. Recon against one subnet flowing immediately into SQLi attempts against an app flowing into LDAP enumeration, all from the same identity or session, with no warm-up and no dead time, is a behavioral profile no analyst produces. Watch for a single principal touching too many distinct asset classes in too tight a window.

Third—and this is the gift GTG-1002 handed defenders—the agent lies to itself. Anthropic’s own report documents Claude overstating findings, claiming credentials that didn’t work, and flagging publicly available information as critical secrets. The operators had to insert human validation checkpoints to cope with it. From your side of the wire that hallucination looks like a flurry of failed authentications with confidently-formed but wrong credentials, repeated attempts against decommissioned hosts, requests for resources that don’t exist. A human backs off after two bad logins. An agent that “believes” it has valid creds will keep hammering. That’s a detection, and it’s a good one, because the false-positive rate on “30 failed auths with structurally-plausible-but-wrong passwords against one account in 90 seconds” is low—just be sure you’re isolating guessing failures from network-driven failures (timeouts, packet loss), which produce a different signal.

None of these survive contact with a noisy index for free. The timing analysis falls apart if half your auth events arrive batched through a syslog relay that stamps them on ingest instead of at the source. Identity correlation falls apart if the agent rotates through credentials and your logs don’t carry a stable session identifier. Plan for the parsing to be the hard part, because it always is.

Disrupting the loop

Detection tells you it’s happening. Disruption is where you actually win, and the autonomous loop has a structural weakness a human operator doesn’t: it cannot proceed faster than it gets answers.

Rate limiting and conditional access are now offensive countermeasures, not just hygiene. An agent firing multiple operations per second against an identity provider is trivially throttled if you’ve set sane limits—and most shops haven’t, because the limits were tuned for human convenience. Entra ID Conditional Access with sign-in frequency and risk-based session controls will choke a machine-speed credential-replay loop in a way it never inconveniences a real user. In hybrid or on-prem AD environments without an equivalent, you’re back to leaning on whatever on-prem lockout and throttling controls you actually have. The catch: in a flat AD forest with no tiering, conditional access policies are coarse and you’ll fight false positives from service accounts that also behave like robots. Tier your admin model first or the control fires on your own automation (NIST 800-53 Access Control and Identification & Authentication families—and yes, this is the boring identity work everyone defers).

Slow the answers and you break the economics. The agent’s advantage is parallelism with near-zero per-action cost. Anything that injects latency or forces a human-validation step on the adversary’s side raises that cost asymmetrically. Session throttling on suspicious sessions, stepping up auth on anomalous access patterns, holding privileged operations for out-of-band approval—none of it stops a determined human, but it strands an agent mid-loop and forces the operator back into the seat, which is the one thing their whole model was built to avoid.

Deception scales beautifully against agents, and this is the part I’d prioritize. Honeytokens, decoy credentials, fake admin accounts seeded in the exact places an automated credential harvest will scrape—an agent that overstates findings and chases every “secret” it sees is the ideal mark. A human operator might smell a canary. An agent grinding through a credential dump at machine speed will grab the honeytoken and try it, and the moment it touches your AWS canary key or your fake domain admin, you have high-fidelity confirmation with essentially no false positives. Honeytokens are cheap, they map cleanly to the System and Information Integrity family and to detection-focused controls, and against this threat model they punch far above their weight. If you do one thing after reading this, seed canaries where an automated harvest will find them.

Defensive AI agents, and where the human line stays

The symmetric question: if the adversary fielded an agent, do you field one back? Short version—yes for triage and correlation, no for autonomous response.

For triage and correlation, increasingly yes, and not as a vendor fantasy. The volume problem is real—an agentic campaign against 30 targets generates more correlated signal than a tier-1 queue clears in a shift, and a defensive agent that drafts the timeline, pulls the related events, and ranks the sessions by anomaly is doing work humans are too slow to do at that tempo. That’s the honest case for AI on defense: matching machine speed with machine speed on the enrichment and triage layer, where being wrong costs an analyst five minutes, not a production outage.

The line I would not move is autonomous response with real blast radius. Letting an agent disable accounts, isolate hosts, or push firewall changes on its own is handing your adversary a denial-of-service primitive, because if the attacker’s agent learns your defensive agent auto-isolates on signal X, it generates signal X against your crown-jewel systems and lets your own automation take you down. The same hallucination problem that betrays the attacker’s agent lives in yours. An overconfident defensive model that fabricates a finding and quarantines a domain controller at 0300 is its own incident.

So the human-in-the-loop line sits at consequential action. Detection, enrichment, correlation, recommendation—agent territory, supervised, with the model’s confidence treated as suspect by default. Containment with material impact—human approval, every time, until the failure modes are far better understood than they are at the 2025\u20132026 state of the art. NIST’s AI RMF language about valid, reliable, and accountable systems isn’t compliance theater here; it’s the design constraint that keeps your own tooling from becoming the attack surface.

On the skepticism, because it matters

The security community pushed back hard on Anthropic’s report, and the pushback is fair. There’s no independent corroboration, GTG-1002 hasn’t shown up in public threat-intel repositories, the technical detail is thin, and the company disclosing the incident also sells the model and the safety story. BleepingComputer and others catalogued the doubts. Anthropic’s own admission that the agent hallucinated through the operation undercuts the “fully autonomous” framing more than its critics needed to.

Fine. Assume the autonomy figure is inflated and the tempo claim is the marketing-rounded version. The defensive posture doesn’t change. Whether the agent ran 90 percent of the operation or 40, the tells are the same, the disruption controls are the same, and the identity hygiene you’ve been deferring is the thing that answers all of it. The capability exists or it’s one model release away. Build for the timing analysis, tighten conditional access, seed the canaries, and keep your own agent’s hands off the kill switch.

What machine-speed actually changes

The behavioral tells

Disrupting the loop

Defensive AI agents, and where the human line stays

On the skepticism, because it matters

Sources