One agent security tool leaves you entirely blind to the threats only a different tier can see

Published: 2026-03-29 Last verified: 2026-04-20 medium

Published Fact-checked 2026-04-20 · 0 corrections

From Theory Delta | Methodology | Published 2026-03-29 | Updated 2026-04-20

You added mcp-scan to your deployment pipeline to catch malicious tool descriptions. You configured cc-safety-net to block destructive Bash commands before execution. You ran 1Password SCAM against your agent’s security posture. You believe you have defense-in-depth.

Here’s what each of those tools can and cannot see — and where the attacks that succeed live in the gaps between them.

What you expect

Agent security tools are a category. Tools in the same category address overlapping concerns. Running multiple “agent security” tools gives you defense-in-depth. More tools means more coverage.

What actually happens

The landscape has fractured into four architecturally distinct tiers defined by where each tool intercepts, when it runs, and what it can observe. No tool spans more than one tier. A tool from tier 1 provides exactly zero coverage for threats that only tier 2 can detect. The gaps between tiers are the actual attack surface.

Tier 1 — Static / Supply-Chain Scanners

When they run: Pre-deployment. Before the agent starts. What they see: Tool descriptions, server configurations, dependency trees, known CVEs. What they cannot see: Anything that only manifests at runtime.

Key tools: mcp-scan (~1,900 stars), ZeroLeaks (510 stars), aguara (52 stars), mcp-context-protector (Trail of Bits, 215 stars).

The ceiling — MCPTox 36.5% attack success rate: A tool description that says “Read files from the project directory” is syntactically identical whether it reads your project directory or exfiltrates data to an external endpoint. Static analysis can flag suspicious patterns but cannot verify runtime behavior. MCPTox reports a 36.5% average tool-poisoning success rate against static analysis — more than one in three attacks passes your pre-deployment scan.

MCPTrust lockfile gap: MCPTrust detects tool schema drift post-install but does NOT detect malicious tool descriptions that are syntactically valid, malicious logic within approved tools, or runtime prompt injection. A lockfile confirms the schema hasn’t changed — it does not confirm the description is safe.

Dynamic tool description supply chain vector: Tool descriptions fetched from community registries (e.g., Smithery) at runtime bypass static scans entirely. A registry entry poisoned after your static scan passed delivers a malicious description at runtime, undetected.

Tier 2 — Runtime Proxy Firewalls

When they run: During execution, intercepting live traffic. What they see: Actual tool call arguments, server responses, protocol-level traffic. What they cannot see: Agent-internal reasoning, planning-stage decisions, anything outside the intercepted traffic path.

This tier has itself fragmented into three sub-types:

Security-enforcement proxies (LLM API layer): aifw/OneAIFW (309 stars), crust. These intercept at ANTHROPIC_BASE_URL or equivalent but have no awareness of MCP tool-call structure. aifw open issue #7 confirms the builder community already sees this gap.

MCP transport proxies (JSON-RPC layer): mcp-guardian (eqtylab, 192 stars), mcp-watchdog (21 stars), mcpwall (2 stars). These see actual tool call arguments but are blind to the LLM API message layer.

Critical inter-tier gap: No runtime proxy combines protocol-level interception (LLM API layer) with MCP tool-call structural awareness (JSON-RPC layer). A prompt-injected tool result passes through all existing runtime proxies undetected unless it triggers a content-level heuristic.

Tier 3 — Hook-Based Enforcement

When they run: Inside the agent process, before tool execution. What they see: Tool call arguments before execution, Bash command content, agent lifecycle events. What they cannot see: Network-level threats, MCP server-side behavior, anything outside the agent runtime.

Key tools: cc-safety-net (~1,100 stars), cchooks (118 stars), claude-code-permissions-hook (23 stars).

This is the only tier that can modify tool arguments before execution (via updatedInput in PreToolUse). Redirect all writes to /sandbox, prepend safety constraints to shell commands.

cc-safety-net enforcement model (verified via source analysis, March 2026):

No disable path. No SAFETY_NET_DISABLE environment variable. No per-session opt-out. The only bypass is removing or modifying the hook configuration itself.
Custom rules are additive only — the schema has no allow_args or whitelist field. Teams cannot exempt specific commands from built-in rules via config.
Runs before permissions. Claude Code execution order: PreToolUse hooks → permissions system. cc-safety-net is a stronger gate than permissions.
Blocks git restore, git stash drop, git branch -D, and gh pr create. In worktree-isolated agent runs: use git switch, git stash push, and route PR creation through an orchestrating session.
False positive on compound cd <path> && git <command> calls — triggers a “bare repository attack” prompt. Use git -C <path> <command> to avoid this.
v0.7.0 expanded to OpenCode, Gemini CLI, and GitHub Copilot CLI — the Claude Code-only characterization no longer applies to the tier, but each platform requires platform-specific integration.

Tier 4 — Benchmarks / Evaluation

What it does: Measures the effectiveness of tiers 1-3. Does not provide protection.

Key instruments: 1Password SCAM (open-sourced Feb 2026), agentic_security (1,798 stars).

1Password SCAM result (bimodal): Critical failures dropped from 65 to 2 with a 1,200-word “security skill” instruction. But the improvement is bimodal:

Model tier	Skill improvement
Strong models (GPT-4o, Claude 3.7)	+6 to +24 pp
Weaker models (GPT-4o-mini, Gemini Flash)	+49 to +60 pp

The headline ~40pp average obscures a two-population distribution. Embedded credentials defeat all 8 tested models regardless of skill training or model capability — a universal failure case no current tier addresses.

What this means for you

If you added mcp-scan and called it done: You have pre-deployment static coverage only. Attacks that deliver malicious behavior at runtime — tool descriptions that look clean at scan time, dynamic registry entries poisoned after your scan — pass through tier 1 undetected. MCPTox’s 36.5% success rate is the quantified ceiling of what you caught.

If you added cc-safety-net: You have hook-based enforcement inside the agent process. You have zero coverage for tool poisoning, supply chain attacks via tool descriptions, or prompt injection in tool results. Those threats pass through tier 3 undetected.

If you ran 1Password SCAM: SCAM measures your current tier effectiveness against credential leakage. It does not add protection — it tells you how effective your other tiers are. The embedded credentials universal failure has no tier that addresses it.

The attack that passes through all three tiers: Tool poisoning with plausible descriptions passes tier 1 (syntactically valid), tier 2 (tool call is legitimate), and tier 3 (no recognizable destructive pattern). MCPHammer’s C2-via-argument technique — commands embedded in tool call arguments, not descriptions — defeats all three protective tiers. The only documented architectural response (pipelock process-level separation: agent has secrets but no network; separate process has network but no secrets) has zero production framework adoption.

For multi-agent systems: HS256 symmetric JWT signing for agent-to-agent auth means any compromised subagent can forge coordinator-level tokens — the shared secret is held by every participant. Asymmetric signing (RS256/ES256) constrains token forgery to private key holders, but adoption in agent frameworks is not confirmed as of March 2026.

What to do

Compose across tiers — a single tier provides incomplete coverage:

Run a static scanner (mcp-scan or aguara) at MCP server installation time to catch known CVEs and suspicious patterns in tool descriptions.
Add a transport-layer proxy (mcp-guardian for human-in-the-loop, mcp-watchdog for automated detection) to intercept tool call arguments at runtime.
Add hook-based enforcement (cc-safety-net) for Bash command inspection inside the agent process — the only tier that can modify arguments before execution.

For the tool poisoning gap specifically: MCPTox’s 36.5% attack success rate means all three protective tiers pass poisoned-but-syntactically-valid descriptions. The only documented architectural response is pipelock process-level separation — no production framework has adopted this yet. Monitor whether pipelock or an equivalent pattern gains adoption in 2026.

No tool adjusts CVSS scores for agent execution context. A CVE rated medium in traditional software may be critical when the affected component is embedded in an autonomous agent with broad tool access.

Watch aegis-mcp: This early-stage governance-wrapper pattern places enforcement at the tool-exposure layer — the agent is provisioned with governed tools from the start, rather than having governance applied post-hoc. Zero production evidence as of March 2026, but the architectural pattern is distinct from all four existing tiers.

Evidence

Tool	Version	Result
mcp-scan	v0.4.x (March 2026)	source-reviewed: static scanning; tool names sent to Snyk servers (not air-gapped); adds agent skill scanning in v0.4
cc-safety-net	v0.7.1	source-reviewed: no disable path, no allowlist field, runs before permissions; v0.7.0 expands to OpenCode/Gemini CLI/Copilot CLI
1Password SCAM	v1.0 (open-sourced Feb 2026)	source-reviewed: bimodal improvement; embedded credentials defeat all 8 tested models
aegis-mcp	0 stars (March 2026)	source-reviewed: governance-wrapper pattern; no production evidence
aifw	309 stars (March 2026)	source-reviewed: open issue #7 confirms no MCP tool-call integration
aguara	v0.8.0	source-reviewed: 173+ detection rules, zero cloud, zero LLM, single binary

Confidence: medium — source-reviewed across 12+ tools (GitHub repo inspection March 2026). Core tier boundaries confirmed by cross-referencing tool insertion points: no tool spans more than one tier. OWASP Agentic Top 10 2026 independently confirms inter-agent trust and cascading failure as under-served gaps. MCPTox attack success rate (36.5%) is an independent confirmation of the tool poisoning gap.

Falsification criterion: A tool that performs both pre-deployment static scanning of MCP server descriptions AND runtime interception of actual tool-call arguments from the same process, with evidence of both capabilities in production use, would disprove the four-tier separation claim.

Open questions: (1) Does mcp-scan’s dynamic proxy mode qualify as tier 2 runtime enforcement, or is it still a batch analysis of captured traffic? (2) Will the governance-wrapper MCP pattern (aegis-mcp) propagate to production-grade tools? (3) Does pipelock have production adopters in any organization?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.