Skip to contentAgent? Read agent.txt

Product updates

A full rundown of the latest releases, improvements, and fixes in Decoy. Updates ship continuously; milestones land here.

New

AI-verified findings: drop false positives, see reasoning

Scanning catches a lot. Most of it's worth fixing — some of it isn't. Telling those apart used to be on you. Now there's a second pass that does it for you.

decoy-scan --verify runs every finding through a two-stage AI pipeline. Haiku 4.5 triages each one as P0/P1/P2 in a single batched call, then Sonnet 4.6 revalidates the P0/P1 set with full context — server config, tool definitions, surrounding findings — and decides whether to keep each one or drop it as a false positive. You get back a ranked list with confidence scores and short reasoning per finding. False positives collapse into an expandable group with the reason they were dropped.

How to try it

npx decoy-scan --verify

A free Decoy account is required (we need an email — the verify pipeline calls Anthropic and we'd rather not pay anonymously into the void). On first run the CLI prints a one-time claim URL — app.decoy.run/d/<installId> — that links the local install to an account in two clicks. After that, --verify just works.

The CLI prints raw and verified counts side by side. On the dashboard, the scan detail view shows a blurred verification panel right next to the findings list with a single "Verify all" button — one click reveals the verified set with reasoning expanded inline.

What's included

  • Free: 5 AI verifications per month, per account. Enough to vet the worst findings on most repos.
  • Team: 30 verifications per seat per day. Full reasoning per finding, fingerprinting, and false-positive correlation across your install base.
  • Business: 60 verifications per seat per day, plus the rest of Team's features.

Caps are fair-use guards (most users never hit them) sized to keep the unit economics honest — verify isn't unlimited because Anthropic API costs aren't.

Verification reuses what's already there: Haiku for cheap triage, Sonnet for the heavy lift, your existing token for auth. Total round-trip is typically 5–10 seconds for a scan with a dozen findings.

Why this exists

Regex-based scanners over-flag by design — they have to, because the cost of a missed real finding is much higher than the cost of a false positive. AI verification flips that ratio at the moment you're staring at the results, so the actionable list is shorter and trustworthy.

Improvement

Anonymized telemetry, by default — and how to read what we collect

The new CLI releases (decoy-scan 0.7.0, decoy-redteam 0.3.0, decoy-tripwire 0.13.0) send anonymized usage telemetry by default. This is how we learn what's working in the wild — which patterns produce false positives, which agent behaviors trip tripwires, which hosts are most common. Without it we're shipping into the dark.

It also has guardrails worth being explicit about.

What we collect

Every event carries:

  • A random UUID from ~/.decoy/install_id (created on first run; never derived from anything identifying)
  • Tool name and version
  • Node version, OS platform, architecture, locale, and a CI flag (detected from standard environment variables)
  • MCP host you're using (Claude Desktop / Cursor / Windsurf / VS Code / Claude Code / Zed / Cline / generic CLI)
  • Run-specific counts: severities, OWASP categories, finding sources, decision distributions

What we don't collect

  • No email, no hostname, no IP, no working directory, no file paths
  • No tool argument values. The tripwire proxy redacts every argument to <type:length> shape before transmission. We can see that a tool was called and what shape its arguments had; we can't see the values. The redactor is open source in decoy-tripwire/server/redact.mjs.
  • No scan source code, no tool descriptions, no full server names beyond what discoverConfigs returns

For confirmed-malicious blocked decisions (critical/high severity), the tripwire attaches a SHA-256 prefix of the original arguments. That lets us correlate the same exploit payload appearing across many installs without ever storing the payload itself.

How to opt out

DECOY_TELEMETRY=0 npx decoy-scan
# or
npx decoy-scan --no-telemetry

Both routes silently no-op the network call. No queueing, no notice, nothing leaves your machine.

Durability

Telemetry posts retry once with exponential backoff. On final failure events queue to ~/.decoy/telemetry-queue.jsonl (capped at 1000 events, FIFO) and the next CLI run drains the queue as a single batched POST before doing anything else. Tripwire decisions aggregate in memory and flush as a single session summary on threshold, timer, or process exit — so a long-running session sends one summary, not N raw events.

The full privacy posture is at decoy.run/privacy.

New

AI-adaptive red team is live

decoy-redteam learned to think. The --team flag now runs an AI-adaptive attack engine on top of the existing 53 deterministic patterns. Sonnet 4.6 reads your servers' tool schemas and source code, generates novel attack payloads tailored to each tool's actual surface, iterates on what it sees, and reports back which ones found something.

This is the half of red teaming the rule-based scanners don't do: novelty per release. Every time we ship a new model or our prompt library improves, your assessment finds things it couldn't find last month — without you doing anything.

How to run it

npx decoy-redteam --team --live --token=YOUR_TOKEN

You'll see two layers in the output: the 53 deterministic patterns that every plan gets, plus the AI-adaptive layer with novel payloads per tool and cross-server attack chains. The CLI prints which novel attacks succeeded and which were caught, with full request/response detail for each.

Plan limits

  • Team: 50 AI-adaptive assessments per seat per month
  • Business: 200 AI-adaptive assessments per seat per month
  • Free: deterministic patterns only (53 baseline)

A fair-use LLM budget sits behind the assessment count to protect against runaway usage on very large repositories. Most users never encounter it. If you do, the CLI gives a clear message and the deterministic red team keeps running.

Read the launch writeup at decoy.run/blog/ai-adaptive-red-team.

ImprovementFix

Red Team 0.1.6: fewer false positives, cleaner output

decoy-redteam 0.1.6 is an accuracy + UX pass. Same attack catalog, but the targeting regexes no longer confuse get_user with an HTTP client, execute_command with a SQL tool, or save_user with a file writer. The CLI output got a parallel pass.

Fewer false positives

Every attack targeter (SQL, command, file-read, file-write, HTTP) now requires a domain-specific marker in both the tool name and a param name. The old regexes treated get, post, run, save, put, and execute as strong signals. They're not. A tool named post_comment is not an HTTP client; a tool named save_user is not a filesystem writer.

The comment blocks above each target in lib/attacks.mjs document what was removed and why, so future regressions are easy to spot.

Browser-automation tools skipped in safe mode

If you had Playwright configured, running decoy-redteam --live used to flash real browser windows open for every SSRF URL payload it sent. Entertaining, not useful. Tools matching browser_*, navigate, goto, open_url, open_browser, open_tab, open_page, open_window, take_screenshot, screenshot, or screencapture are now skipped by default. Opt back in with --live --full; the warning banner tells you this up front.

The dry-run plan and live banner now disclose the skipped count so you know what got excluded.

Output polish

  • Progress line no longer flickers. A race between the spinner interval and the per-attack progress callback left ghost text like Attacking…ma boundary. Now one writer owns the line.
  • Low-severity findings tallied, not blobbed. Instead of an 800-char comma-separated list of 81 findings, you get top types with counts: Protocol ×17, Prompt injection ×14, Path traversal ×6, ….
  • Next-steps block after the summary, modeled on decoy-scan. Patch pointer, re-run command, tripwire install, SARIF export.
  • Help text pruned and aligned. Two-column rows that were hand-padded around ANSI color codes are now actually aligned. The warning emoji renders as an emoji, not the narrow text-style glyph.

Tests

86 passing, up from 71. Added false-positive guards for every narrowed regex and positive regression cases for the compound (verb).?file matchers used by the file-read and file-write targets.

See the full CHANGELOG.

NewImprovement

Scanner 0.5.0: CLI polish and a new `explain` command

decoy-scan 0.5.0 is a significant polish pass on the CLI people actually see. First-run matters — this is how most users meet the product — so we rebuilt the output with a clear visual hierarchy and added a way to ask why something was flagged.

What's new

decoy-scan explain <target> — a new subcommand that explains anything the scanner can say about your servers. Works across four kinds of targets:

  • Severity tiers: explain critical / high / medium / low
  • Finding categories: explain tool-description, env-config, typosquat, …
  • Poisoning types: explain prompt-override, coercive-execution, …
  • Tool names: explain evaluate_script tells you which rule matched

Explanations are sourced from the same RISK_PATTERNS and POISONING_PATTERNS the scanner uses, so they can't drift. --json is supported on every path for agent consumption — explain is designed to be called by Claude Code, Cursor, and anything else that wants to resolve a decoy-scan finding.

Output polish

The pretty TTY output (what you see when running npx decoy-scan) got a full rework. JSON, SARIF, and --brief outputs are unchanged.

  • Progress lines at the top of every run so you know what happened: ▸ Discovering MCP servers… 6 found / ▸ Running 52 checks…
  • Severity legend shown once — no more repeating "critical means code exec" under every server.
  • Per-server badge replaces the hosts string. Each server leads with ✗ name 2 critical or ! name poisoned tool (magenta) or ? name probe failed instead of buried severity.
  • High-risk renders in orange — previously red, visually identical to critical. Orange gives you a real "bad but not worst" tier.
  • Low tier collapses to a count. No more wall of safe tool names.
  • Long lists wrap with a proper hanging indent so tool names and error messages don't lose their left margin mid-line.
  • Summary reads clearly: 4 issues found 2 critical, 2 high · 48 checks passed · 1s followed by one line of actionable guidance instead of the old opaque "issues blocked" wording.

Fixes

  • Servers that failed to probe used to misleadingly show as "passed" because they had no findings attached. They now get ? probe failed with the underlying error wrapped cleanly.
  • Decoy tripwire servers configured across multiple hosts no longer duplicate in the output.
  • --sarif and --json output could be truncated when piped to another command (e.g. decoy-scan --sarif | jq '.runs'). Root cause was Node's process.exit() killing the process before stdout drained; the CLI now waits for the pipe to flush before exiting. This was silently breaking CI pipelines that consume SARIF output.

Install

npx decoy-scan@latest

Or pin the version in CI: npx decoy-scan@0.5.0.

NewImprovement

Tripwire 0.11.0: auto-block when a tripwire fires

Detection without response is just a louder alarm. decoy-tripwire 0.11.0 closes the loop: when a tripwire fires, the compromised agent is paused automatically, and every wrapped MCP server denies subsequent calls in sub-ms. No dashboard hop, no polling, no waiting for the user to notice the notification.

What's new

init wraps your existing MCP servers through a local proxy by default. One command signs you up, installs the proxy, drops in the tripwires, and rewrites each MCP host's config so upstream servers run under the proxy. Zero manual config — pass --no-wrap if you want to opt out.

Auto-pause on tripwire hit. A compromised agent gets a 10-minute pause (configurable via ~/.decoy/config.json) written to a shared registry at ~/.decoy/pause.json. Every proxy instance reads this file on its hot path — one tripwire hit blocks every wrapped upstream, same process, sub-ms. Auto-expires so false positives recover without intervention.

Desktop notifications. Native macOS / Linux / Windows notifications surface the tool name, the paused agent, and the TTL. You see it the moment it happens, not when you check email.

New CLI commands for when a tripwire fires:

  • decoy-tripwire resume <agent-id> — clear an auto-pause immediately
  • decoy-tripwire resume --all — clear every pause
  • decoy-tripwire lock <agent-id> — upgrade a pause into a permanent block
  • decoy-tripwire lockdown on — any tripwire hit pauses every agent (not just the one that tripped). For shared dev machines where blast containment matters more than convenience.

Why local-first

Hosted enforcement would add a network round-trip and a polling lag between the hit and the block. For a real attack, that's long enough to exfiltrate. Local detection + local enforcement means:

  • Sub-ms block. File-backed registry, O(small read from page cache).
  • Works offline. Dashboard sync is fire-and-forget; detection and blocking don't depend on the network.
  • One upgrade unlocks everything. Every MCP server behind the proxy benefits from a trip on any other server.

Dashboard, alerts, webhooks, Slack — all still work, just in the fire-and-forget path that doesn't block the hot path.

Upgrading

npx decoy-tripwire init

Re-running init wraps any newly added upstream servers. Existing installations get backed up to *.bak.<timestamp> next to each config file, so a wrap gone wrong is a one-line restore.

See the full changelog on GitHub.

Improvement

Threat intel is dashboard-first

We retired the Monday threat-intel email. Signal was arriving a week late, and for half of you it was duplicating what the dashboard and Slack trigger alerts already show.

What changed

  • Weekly digest email removed across all plans — no more Monday inbox hit
  • Intel lives in two faster places now: the live feed on the Guard dashboard, and the /api/feed JSON endpoint for your own polling
  • Trigger events still fire Slack and webhook alerts as they happen
  • Pricing page and upgrade flow updated to reflect the new delivery

If you relied on the digest as a "once a week, no login" format — let us know. An RSS feed of the intel stream is a shorter hop from here than rebuilding the email.

Improvement

Per-user pricing: Team $29/user/mo, Business $99/user/mo

We rebuilt pricing around per-user billing. Team is now $29 per user per month, Business is $99 per user per month, and the old fixed-bucket tiers are gone. Yearly billing saves 20% on both.

Why the change

The old plans assumed teams of a fixed size and made the wrong tier expensive for small teams. A two-person team trying out Decoy shouldn't pay the same as a twelve-person security org. Per-user is the model every SaaS in this space — Linear, Vercel, Stripe — has converged on for a reason: it scales with the value you get.

What the numbers look like

Team sizeTeam plan / moBusiness plan / mo
1 (solo)$29$99
5$145$495
25$725$2,475

Yearly billing knocks 20% off: Team $23/user/mo billed annually, Business $79/user/mo billed annually.

How seat changes work

Adding or removing teammates prorates immediately. Upgrades take effect the moment you confirm. Downgrades wait until the next billing cycle so you don't lose features you've already paid for.

What's included

Same features as before — decoy-scan, tripwires, threat intel, AI red team, the dashboard. The full breakdown is on the pricing page.

FixImprovement

Scanner 0.4.6 and a security pass across the stack

End-of-week polish release. Spent the week running Decoy against itself and closing everything the scanner found.

Fixes

  • decoy-scan 0.4.6 — internal audit fixes across the CLI, including stricter handling of malformed tool descriptions that previously could cause a classifier to throw
  • Dashboard security fixes across the worker — scoping and validation hardening for every /api/ endpoint, caught by our own red-team suite
  • Config probe now distinguishes could not start from clean result so broken servers don't silently look healthy in the summary

Improvements

  • CSP compliance: replaced all remaining inline onclick handlers in the dashboard with event-delegated listeners. Next step: stricter CSP header coming with next week's release.
  • Scan detail header now reflects findingsBySeverity consistently, so the number at the top of a scan page matches the counts on the list view
NewImprovement

Shadow MCP discovery is generally available

Every MCP server running anywhere in your organization — approved or not — now surfaces in your Decoy dashboard automatically. Shadow MCP discovery graduates from beta this week and ships to all Pro and Business plans.

What's new

  • Auto-discovery of servers running on dev machines via the CLI
  • Org-wide inventory with per-team filtering
  • One-click capture into your Decoy inventory, with provenance
  • New CLI flag --discovery-only for silent inventory runs in CI

Improvements

  • 3× faster initial discovery scan — under 8 seconds for 50+ servers on an M-series laptop
  • Better fingerprinting for MCP servers behind HTTP gateways or wrappers
  • Discovery runs now tag each server with a first_seen timestamp for drift tracking
NewImprovement

Per-server scan drill-down and consistent severity counts

Dashboard week. Guard's scan view now matches what the CLI prints locally, so you can start from a summary and dig into exactly what hit.

What's new

  • Scan detail view — click a scan in your history to see every server it touched, with the findings grouped by severity and a remediation snippet for each
  • New findingsBySeverity field on every scan summary, used everywhere counts are rendered so list view and detail view never disagree
  • Typosquat findings now render with their own label on the finding card, making them easier to spot at a glance

Improvements

  • Agent detail panel: light-mode text colors fixed. Everything stayed dark-mode-only before the theme refresh
  • Scan history counts pull from the new severity field directly; no more re-deriving numbers client-side
  • Guard MCP server reached 9 tools with full test coverage (3 Free, 6 Pro)
New

Decoy Red Team v1 — autonomous adversarial testing for MCP servers

The third and hardest tool in the Decoy suite is live. decoy-redteam is not a scanner — it's an attacker. It connects to every MCP server on your machine, sends adversarial payloads, and reports exactly what broke.

npx decoy-redteam --live

Dry-run by default. --live requires an explicit confirmation before any payloads leave the process.

What it tests

53 attack patterns across 6 categories, every finding mapped to OWASP Top 10 for Agentic Applications 2026:

  • Input injection — SQL, command, path traversal, SSRF, template
  • Prompt injection — instruction override, role hijack, indirect, encoding bypass, multi-turn
  • Credential exposure.env, cloud credentials, SSH keys, git tokens, shell history
  • Protocol attacks — malformed JSON-RPC, capability escalation, replay, method injection
  • Schema boundary — type coercion, null bytes, overflow, prototype pollution, NoSQL operators
  • Privilege escalation — scope escape, undeclared access, dotfile enumeration, argument smuggling

What ships

  • Zero-dependency CLI, works with Claude Desktop, Cursor, Windsurf, VS Code, Claude Code, Zed, and Cline
  • SARIF 2.1.0 output that uploads straight to the GitHub Security tab
  • GitHub Action: uses: decoy-run/decoy-redteam@v1
  • Exit codes for CI gating (0 clean, 1 high, 2 critical)
  • Guard Pro support for AI-adaptive attacks and encoding variants (~198 extra)

Full docs at /docs/redteam/overview. The v1.1 point release shipped a few hours later with zero-dependency enforcement locked down.

New

Decoy Scan on the GitHub Marketplace

decoy-scan v1 is now a first-class GitHub Action, listed in the Security category of the Marketplace. One step in a workflow, fails the build on critical tools or prompt injection, uploads SARIF to the GitHub Security tab automatically.

Usage

- uses: decoy-run/decoy-scan@v1
  with:
    policy: no-critical,no-poisoning,no-toxic-flows
    report: true
    token: ${{ secrets.DECOY_TOKEN }}

What the action gates on

Every policy rule matches a real class of finding:

  • no-critical — any critical tool
  • no-high — any high-risk tool
  • no-poisoning — prompt injection in tool descriptions
  • no-toxic-flows — dangerous cross-server data flows
  • no-secrets — credentials exposed in MCP config
  • require-tripwires — no Decoy Tripwires installed
  • max-critical=N / max-high=N — budgeted tolerance

What's new vs the CLI-only release

  • Inline remediation suggestions on every finding
  • Job summary with pass/fail and counts
  • Automatic SARIF upload (no separate upload step)
  • Opt-in dashboard reporting via report: true

Docs: /docs/scan/ci-cd.

New

Guard MCP server adds tool-level allowlists

You can now scope which Decoy Guard tools an agent can call, per-agent or per-session. Useful for sandboxing untrusted agents while letting your production agents use the full toolkit.

What's new

  • allowlist config in your Guard MCP server startup command
  • Per-agent allowlist editor in the dashboard, with a "dry-run" mode to see what a rule change would have blocked over the last 7 days
  • New webhook event: tool.blocked.by.allowlist, including the agent fingerprint and blocked tool name
NewImprovement

Toxic flows, policy gates, and shareable reports

A wide release. New detection categories in the scanner and a few features in the dashboard that make scan results actually useful in a team context.

Scanner

  • Toxic data flows — dangerous tool chains across servers (e.g. a read-from-private-source tool paired with a send-to-public-endpoint tool on the same agent). Flagged even when each individual tool looks benign.
  • Manifest hashing — every scan captures a content hash of the server's tool manifest, so you can detect silent changes (same version, different tools) without needing semver bumps
  • Skill / prompt scanning — servers exposing prompts or skills now have those surfaces analyzed too, not just tool schemas
  • --policy flag for CI gating on the CLI (matches the GitHub Action's policy input)
  • --share flag to generate a public shareable URL for a scan result
  • --fix flag (experimental) that emits a diff against your MCP config

Dashboard

  • Auto-registernpx decoy-tripwire now handles account creation end-to-end, no dashboard visit required for first install
  • Shareable reports — scan results get a public URL you can send to a team member or to an MCP maintainer without giving them account access
  • RSS feeddecoy.run/monitor/feed.xml for advisories and attack patterns, public
  • Tier gating — proper plan enforcement on every API endpoint, with -32000 error codes on the MCP surface
  • Multiple KV correctness fixes caught by the new test suite
NewFix

Scan-first flow and PCI-compliant upgrades

Product flow

decoy-scan is now the first thing every new user runs. The CLI, the docs, the dashboard onboarding: everything points to "scan something you already have" before "install tripwires." Faster time-to-value, and it proves Decoy is worth using before anyone has to install anything.

  • npx decoy-scan is the headline command in every install path
  • Dashboard empty state guides you to run a scan, not register a server
  • Marketing site hero and footer all reference scan-first flow

Billing

  • PCI fix (breaking): decoy_upgrade no longer collects card details. Returns a Stripe Checkout URL instead. Card data never touches our worker; it goes directly from the browser to Stripe.
  • Checkout flow updated across tripwire CLI, Guard dashboard, and the upgrade email
  • Pricing display updated to match current Guard Pro ($99/mo at the time; since adjusted)

Fixes

  • Tripwire CLI: pad() no longer crashes on undefined agent.clientName for anonymous free-tier triggers
  • Auth flow: fixed token-in-URL persistence after first passkey registration
  • Agent-friendly API error messages (more descriptive action hints)
NewImprovement

Dynamic tripwires and session telemetry

Two things that together make tripwire triggers actionable, not just visible. Static catalogs are fingerprintable: attackers can learn which tools are decoys. Trigger context with no session attribution is impossible to investigate. This release fixes both.

Dynamic tripwires

Alongside the 12 built-in decoy tools, every Decoy install now generates a deterministic set of additional tools drawn from 6 threat categories:

  • Cloud infrastructure (IAM, compute, storage, metadata)
  • Secrets management (vaults, KMS, credential stores)
  • Payment systems (checkout, refunds, ledgers)
  • CI/CD (deploy keys, workflow dispatch, build secrets)
  • Identity (user admin, MFA, session management)
  • Network (DNS, firewall, VPN)

Each workspace gets a unique-looking tool catalog. No two Decoy installs look alike to an attacker.

Session telemetry

Every tripwire hit now captures the full session context that led to the call:

  • MCP initialize payload (client name, version, protocol version)
  • Session fingerprint: SHA-256 of clientName + clientVersion + userAgent, truncated to 16 hex chars
  • Request headers (user agent, IP prefix, geo hint where available)
  • The last N messages of prompt context leading up to the call

Emitted as structured JSON lines to stderr for local ingestion and shipped to the dashboard for Pro+ users.

Improvements

  • SSE transport security checks added to the scanner
  • Input sanitization validation: schema completeness, type constraints, pattern validation. The scanner flags tools that accept unbounded input.
  • Explicit permission scope scoring on every tool, included in the risk classifier
FixImprovement

Tripwire alert deduplication and delivery fixes

A batch of reliability work on tripwire alerts. What you get notified about should match what actually happened — exactly once.

Fixes

  • Fixed a race where simultaneous tripwire triggers on the same agent would send duplicate Slack alerts
  • Fixed missing agent fingerprints on webhook deliveries when the session started mid-trigger
  • Fixed alert timestamp drift in Slack messages for triggers older than 30 seconds

Improvements

  • Slack alert formatting now uses blocks instead of plaintext — the agent fingerprint and tool name are clickable
  • Webhook retries use exponential backoff (was linear) — fewer retry storms on flaky endpoints
  • decoy-tripwire test now supports a --delivery flag to verify your alert channels end-to-end
New

Decoy is live

First public release. npx decoy-scan and npx decoy-tripwire are live on npm today — both free, both zero-config, both work with the MCP clients you already use.

What ships today

  • npx decoy-scan — static scanner for MCP servers. 50+ checks across supply-chain hygiene, tool scoping, credential handling, and prompt-surface risks. Runs anywhere npx runs.
  • npx decoy-tripwire — 12 built-in decoy tools installed alongside your real MCP servers. Fires on any invocation. Alerts by email on the free tier.
  • Decoy dashboard at app.decoy.run — passkey login, token-based auth, unified view for scan results and tripwire triggers.

What's coming

  • Decoy Guard — the hosted MCP threat-intel server. Closed beta; join at decoy.run.
  • Decoy Red Team — autonomous adversarial testing. Closed beta.
  • GitHub Actions for decoy-scan and decoy-redteam landing in the coming weeks.