Anatomy of a tripwire: how we detect compromised agents with zero false positives
Tripwires are decoy tools installed alongside real MCP servers. Honest agents never call them, so every trigger is signal. Here's how the design holds up.
The first question we get about tripwires is always the same: "Won't they light up constantly?" The answer, by design, is no. The reason why is the whole insight.
The setup
A tripwire is a normal-looking MCP tool. It advertises a plausible name, a plausible description, and a plausible schema. A few real examples from the current decoy-tripwire pool:
aws_iam_create_admin_role(account_id, role_name)gcp_service_account_key_export(service_account, project_id)rotate_master_encryption_key()vault_unseal(key_share)export_ssl_private_keys()firewall_rule_disable(rule_id)
These are installed as part of decoy-tripwire alongside your real MCP servers. Agents discover them the same way they discover anything else, via the standard protocol handshake.
The key property
No legitimate task ever has reason to call them.
That sounds obvious, but it's the whole trick. Your coding agent working on a PR doesn't need to export GCP service account keys. Your research agent reading docs doesn't need to unseal Vault. These tools advertise capabilities that only a compromised agent (one following an attacker's payload instead of the user's intent) would want to invoke.
When the invocation happens, we know three things instantly:
- The agent executed a capability nobody asked for.
- The capability matched a known exfiltration or destructive pattern.
- The compromise is active right now, not hypothetical.
That's a high-signal alert. Zero false positives isn't a marketing line. It's a mathematical property of the setup.
The rotation trick
The pool has 40+ honey-tool templates spanning cloud IAM, secret managers, databases, SSH, billing, and DNS. Any given session installs 6 of them by default. The selection is deterministic-per-token, so the same developer sees a consistent fingerprint over time, but two different developers see different tools.
That matters for two reasons:
- Attackers can't memorize a blocklist. An injection payload that tries to dodge
vault_unsealon one developer's machine will cheerfully call it on the next one. - Triggers are attributable. Because the rotation is token-seeded, a trigger on
gcp_service_account_key_exporttells us which install fired, not just that something tripped.
Override with DECOY_HONEY_TOOLS=12 or DECOY_HONEY_TOOLS=all to deploy more.
What we log
Every trigger captures the full trace: client fingerprint, session ID, the exact arguments passed, the surrounding tool-call sequence, and (if we can reconstruct it) the prompt that led to the call. That's the forensic record you'll want when you're explaining to your CISO what happened.
The counterintuitive part
The best tripwires are ones that never fire. A deployed tripwire that logs zero triggers for six months is doing its job. It's a smoke detector, not a motion sensor. The alert is only meaningful because the baseline is silence.
Install in one line:
npx decoy-tripwireThat's it. No account, no config, no cloud reporting unless you want it. Pair with decoy-scan for static checks and decoy-redteam to see if your honey tools would even be reached under attack.