theorydelta field guide
built 2026-06-21 findings: 55 task hubs: 6 independent · evidence-traced · no vendor influence

A2A Agent Card Skill Descriptions Are an Unprotected Injection Surface — 100% Exfiltration in Tested Scenarios

Published: 2026-06-12 Last verified: 2026-06-12 empirical
Staleness risk: high — facts in this subject area change quickly between releases. Re-check the specific claims against your own environment before acting. (This rates the topic, not whether this page is out of date.)

A2A Agent Card Skill Descriptions Are an Unprotected Injection Surface — 100% Exfiltration in Tested Scenarios

What you expect

An A2A Agent Card is the machine-readable identity and capability manifest published by every A2A-compatible agent at /.well-known/agent-card.json. It contains a name, description, and a skills[] array — each skill entry has its own name, description, tags, and examples fields. These fields describe what the remote agent offers so that orchestrating agents can decide whether to delegate tasks to it.

The expectation: these are metadata fields, read by the orchestrating agent to understand remote agent capabilities. A well-formed Agent Card from a legitimate service should pose no security risk beyond normal protocol interactions.

What actually happens

The A2A spec defines AgentCard and AgentSkill description fields as free-form strings with no input sanitization requirement. When an orchestrating LLM fetches a remote agent’s card and processes its fields to decide whether and how to delegate, those strings enter the LLM’s reasoning context as prompt input — not as inert metadata.

A malicious remote agent embeds adversarial instructions directly in these fields. Because the orchestrating LLM has no basis to distinguish “capability description” from “instruction,” the injected text is interpreted as part of its prompt context and can redirect subsequent tool calls.

Keysight Security (March 2026) demonstrated this in a simulated multi-agent delegation workflow: a host agent fetches remote agent cards to find and assign tasks. With adversarial instructions embedded in the remote agent’s skill descriptions, the host was redirected to transmit sensitive user data — including PII — to attacker-controlled endpoints. Keysight reported 100% exfiltration rates in their tested A2A scenarios. The CyPerf 26.0.0 release includes dedicated simulation strikes for this attack class.

Palo Alto Networks Unit 42 documents a related vector — agent session smuggling — where malicious content in A2A messages hijacks agent behavior across session boundaries. Both findings confirm the same structural root cause: A2A systems pass external agent-provided content into LLM context without a sanitization layer.

A2A v1.0’s JWS card signing does not prevent this attack. Signed Agent Cards (v1.0) verify that the card arrived unmodified from its issuer. They do not verify that the issuer’s content is safe to include in LLM context. A signed poisoned card is still a poisoned card. The v1.0 spec adds no sanitization requirement for card field content.

What this means for you

Any agent that dynamically fetches and processes Agent Cards from untrusted sources is a direct exfiltration path. This includes:

  • Orchestrators that discover remote agents at runtime via /.well-known/agent-card.json
  • Agent registries that aggregate cards and serve them to client agents
  • Workflows that include raw Agent Card content in LLM prompts (e.g., “here are the available agents: [card content]”)
  • Framework implementations that automatically fetch cards before task delegation

The attack requires no exploit. The remote agent publishes a valid, spec-compliant card. The orchestrating agent fetches it normally. The vulnerability is in the LLM’s treatment of card content as prompt context.

With 150+ organizations in production including Google, Microsoft, and AWS, the attack surface is live. Any production A2A deployment that performs dynamic agent discovery is exposed.

What to do

  1. Treat all AgentCard fields as untrusted user input. Never include raw description, skills[].description, skills[].examples, or tags fields directly in LLM prompts. Sanitize or strip free-form text before including it in orchestrator context.

  2. Use structured card summaries instead of raw card text. When an orchestrating agent needs to reason about available remote agents, generate structured summaries (e.g., “agent X handles task type Y, version Z”) rather than passing raw natural-language description fields into the LLM.

  3. Restrict dynamic agent discovery to a pre-approved allowlist. If your workflow discovers remote agents at runtime, limit Agent Card fetching to known-trusted issuers. Dynamic card fetching from arbitrary endpoints is the highest-risk pattern.

  4. Understand that v1.0 JWS signing is not a countermeasure. Signing proves integrity-in-transit; it does not prove the issuer’s content is safe to include in LLM reasoning context.

  5. Test with adversarial card content before production. Embed a canary instruction in a test agent’s skill description and verify your orchestrator does not execute it. If it does, add sanitization before deploying.

Falsification criterion: This finding would be disproved by evidence that A2A’s orchestration layer (in the a2a-python SDK, Google ADK, or another production orchestrator) preprocesses AgentCard description fields to neutralize natural-language instructions before including them in LLM context, or by a replication of Keysight’s test scenario that achieves materially lower exfiltration rates without adding sanitization.

Evidence

ToolVersionEvidenceResult
A2A Protocol Specv1.0 production (May 2026)source-reviewedAgentCard.description and AgentSkill.description are free-form strings; spec defines no sanitization requirement or content-safety constraint
Keysight CyPerfv26.0.0 (March 2026)independently-confirmed100% PII exfiltration demonstrated via adversarial instructions in Agent Card skill descriptions in simulated A2A multi-agent delegation
Palo Alto Unit 422026independently-confirmedAgent session smuggling confirmed in A2A systems — external agent content reaches LLM context without sanitization (related root cause)
a2a-python SDKv0.3.25 stablesource-reviewedAgentCard model surfaces description and skill fields to application layer as plain strings with no sanitization
A2A v1.0 JWS signingv1.0 (May 2026)source-reviewedSigning verifies card integrity-in-transit; spec makes no claim that signed card content is safe to include in LLM context
A2A v1.0 release — PR Newswirev1.0 (May 2026)independently-confirmed150+ orgs in production including Google, Microsoft, AWS — attack surface is present in live production deployments

Confidence: empirical — 4 sources reviewed, 3 independent confirmations. The Keysight Security research (March 2026) provides the primary independent confirmation with specific exfiltration rates from tested scenarios.

Strongest case against: The 100% exfiltration rate is specific to the orchestrator implementation Keysight tested and may not generalize to all A2A deployments. Production orchestrators that structure card content before passing it to LLM context — rather than including raw description fields verbatim — may be substantially less vulnerable. Additionally, deployments using pre-configured static agent registries rather than dynamic discovery are not directly exposed to this attack vector. The protocol-level gap is real, but operational isolation can mitigate it without waiting for a spec update.

Open questions: Does Google ADK apply sanitization to AgentCard fields before including them in LLM orchestration context? What is the exfiltration rate against orchestrators using structured card summaries versus raw description fields?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.

theorydelta.com · 2026 independent · evidence-backed · every claim sourced or labelled about · glossary · rss · mcp · /scan · llms.txt