A2A and MCP Agent Security: Identity, Delegation, and Audit Trails

Q: What is the difference between LLM guardrails and A2A MCP agent security?

LLM guardrails filter model inputs and outputs for safety and policy violations. Agent protocol security controls who may invoke tools, delegate tasks, and act on whose behalf across MCP and A2A boundaries with identity, authorization, and audit trails.

Q: How should agent identity work in an A2A deployment?

Separate human identity, agent service identity, and task context. Agent Cards advertise capabilities and supported auth schemes but are not proof of trust. Validate credentials on every request and issue scoped tokens rather than treating discovery metadata as authorization.

Q: What is the confused deputy problem in multi-agent systems?

A confused deputy is a privileged agent or tool server that performs a sensitive action because a less privileged caller tricked it into using its authority. Delegation chains and MCP tool calls need explicit scope checks so downstream components never act beyond the original user intent.

Q: Do you need an A2A gateway in production?

Small single-team deployments may enforce policy inside each agent, but multi-tenant, multi-vendor, or partner-facing agent networks usually need a gateway for centralized auth, routing, rate limits, and audit. The threshold is crossed when more than one team owns agents that call each other.

Q: What should an A2A MCP audit log contain?

Log user ID, agent ID, task ID, parent task ID for delegation chains, tool calls, policy decisions, artifacts produced, and timestamps at gateway, agent, and MCP server layers. Correlate logs with trace IDs so a final answer can be reconstructed end to end.

Protocol security is who may act, not the model.

Page content

Prompt injection gets most of the security attention in LLM systems, and it deserves attention, but it is not the whole problem once agents start calling tools and delegating work to other agents.

MCP gives an agent structured access to files, APIs, databases, and ticketing systems. A2A lets one agent send tasks, messages, and artifacts to another agent that may belong to a different team, vendor, or runtime. Those protocols are useful precisely because they cross trust boundaries, which means identity, authorization, delegation limits, and audit trails become first-class architecture rather than optional hardening.

A2A and MCP agent security architecture with identity, gateway, and audit layers

This article is the canonical guide for agent protocol security in the LLM Architecture cluster. It covers threat models, identity, gateways, registries, delegation, and production checklists. For input validation, output filtering, and prompt safety patterns, see LLM Guardrails in Practice instead.

Guardrails vs Protocol Security vs Runtime Policy

These three layers solve different problems and fail in different ways when conflated.

LLM guardrails operate on model input and output: blocking injection patterns, filtering harmful content, validating JSON shape, and enforcing tone or compliance rules on generated text. They protect the conversation layer.

Protocol security operates on agent boundaries: who may call which MCP tool, which agent may delegate to which peer, what OAuth scopes attach to a task, and whether a downstream agent may act on a user’s behalf. It protects the action layer.

Runtime policy sits between them: a policy engine that evaluates requests against rules regardless of whether the trigger was natural language or a structured protocol call. It can require human approval before a tool executes, block egress to unknown domains, or deny delegation when scope exceeds the originating user.

My opinion is blunt: guardrails without protocol security produce polite chatbots that still exfiltrate data through a tool call. Protocol security without guardrails produces well-authenticated agents that still follow malicious instructions embedded in an artifact. You need both, plus runtime policy for high-risk actions.

Threat Model for A2A and MCP Agent Systems

Start with assets and adversaries, not with a shopping list of controls.

Assets worth protecting: user data in prompts and artifacts, credentials for MCP servers, production systems reachable through tools, agent reputation, billing accounts tied to token usage, and audit integrity.

Realistic adversaries: external users abusing public agent endpoints, compromised MCP servers returning poisoned tool results, malicious agents misrepresenting skills in Agent Cards, insiders over-delegating authority, and supply-chain tampering with tool metadata that manipulates model behavior.

Malicious or compromised tools (MCP)

An MCP server is code plus data exposed to the model. A hostile server can return misleading tool descriptions, exfiltrate arguments passed by the model, or perform actions beyond what the user intended when the host executes tool calls without scoped credentials.

Malicious or impersonated agents (A2A)

An agent that accepts tasks may be evil, compromised, or simply over-permissioned. Agent Cards describe capabilities; they do not prove identity unless you verify signatures, TLS, and issuer trust.

Confused deputy

Agent B holds permission to access a finance API. Agent A, with lower privilege, asks B to “summarize this invoice” while smuggling a transfer instruction in an artifact. B executes using its own credentials unless delegation scope is enforced end to end.

Over-broad permissions and hidden delegation chains

User approves one step. The orchestrator silently chains three A2A hops and five MCP calls. The user never sees the full graph, but the organization is still accountable for the outcome.

Prompt injection through artifacts and cross-agent messages

Injection is not only a user-message problem. A PDF artifact, a web page fetched by a tool, or a message from Agent C can carry instructions aimed at Agent D’s model. Treat all protocol-carried content as untrusted input at the model boundary.

Poisoned or misleading Agent Cards

Descriptions and skill names are prompt surface area. A card that advertises safe_read_only_analysis while accepting write-capable backends is a social-engineering layer, not a technical guarantee.

Identity Model for Multi-Agent Systems

Protocol security begins with clear identity types and what each one is allowed to prove.

Identity type	What it represents	Typical proof
Human user	End user or operator who initiated work	OIDC session, SSO token
Agent service	Deployed agent runtime (orchestrator, specialist)	OAuth client credentials, mTLS cert
MCP server	Tool provider process	API key, mTLS, scoped service account
Task / session	Unit of work spanning hops	task ID, trace ID, delegated scope token

A2A’s Agent Card advertises supported authentication schemes (OAuth 2.0, API keys, mTLS, and similar patterns aligned with OpenAPI practice) and skills with optional security requirements. The card is discovery metadata, not a trust anchor. Clients obtain credentials out of band and send them in standard HTTP headers on every request; servers must validate on every call and return 401 or 403 when auth or scope fails.

Internal vs external views of the same agent

Production agents often publish a public Agent Card with a limited skill list and a richer authenticated card for internal callers. The A2A specification allows extended cards for authenticated clients. Use that split deliberately: partners should not see internal skills, and internal orchestrators should not rely on public discovery alone for authorization.

Authentication and Authorization for MCP and A2A

Authentication answers who is calling. Authorization answers what they may do.

MCP tool access

For each MCP connection, define:

which agent host may connect
which tools are enabled for that host
which OS user or service account executes side effects
whether the human user must approve each mutating call

Prefer tool allowlists over “connect everything” MCP configs. A coding agent does not need payroll MCP servers on the same profile as a public support bot.

A2A agent access

For each agent peer relationship, define:

which caller agent IDs may invoke which skills
maximum delegation depth
which artifact types may cross the boundary
whether user context must propagate as signed claims

Map OAuth scopes (or equivalent) to skills, not to blanket agent admin. Least privilege at the token layer beats hope at the prompt layer.

Gateway-enforced vs per-agent policy

Per-agent policy works when one team owns the whole graph and releases are coordinated. Gateway-enforced policy works when multiple teams, tenants, or vendors share an agent network and you need one place to enforce allowlists, rate limits, and audit.

flowchart LR U[User / client] --> G[A2A gateway] G --> O[Orchestrator agent] O -->|A2A scoped token| S1[Specialist agent] O -->|A2A scoped token| S2[Specialist agent] S1 --> MG[MCP gateway] S2 --> MG MG --> T1[MCP tool servers] MG --> T2[MCP tool servers] G --> A[Audit log] MG --> A S1 --> A S2 --> A

A2A Gateway as the Control Plane

An A2A gateway is not strictly required by the protocol, but it becomes necessary when agent traffic needs centralized governance.

A gateway typically handles:

authentication termination and token exchange
routing to the correct agent service by skill or tenant
policy checks before tasks are accepted or forwarded
protocol version negotiation
rate limiting and abuse detection
structured audit emission on every task transition

When a gateway is overkill vs necessary

A gateway is often overkill for a single orchestrator and two specialist agents in one Kubernetes namespace maintained by one team. It becomes necessary when partners invoke your agents, when multiple business units share infrastructure, when compliance requires uniform logging, or when you cannot trust every agent implementation to enforce policy correctly.

Pair an A2A gateway with an MCP gateway (or MCP proxy) so tool access receives the same treatment: identity, allowlists, egress controls, and audit at the tool boundary rather than only at the chat UI.

Partner-facing vs internal Agent Cards

Publish different discovery metadata for external and internal callers. External cards expose narrow skills and stricter auth. Internal cards may list maintenance or admin skills but must never be reachable without stronger authentication than the public card implies.

Agent Registry and Discovery Security

Discovery is part of the attack surface. Anyone who controls what agents appear “available” controls where orchestrators send work.

Registry vs well-known Agent Card URLs

Small deployments use well-known URLs per agent (/.well-known/agent-card.json). Enterprise deployments add a registry that indexes agent IDs, versions, endpoints, owners, and policy tags. The registry is a policy object: entries should record which tenants may discover which agents, not only where they live.

Versioning, deprecation, and ownership

Registry records need owners, change history, and deprecation dates. An orchestrator that caches Agent Cards must refresh on TTL and verify signatures where supported. Stale cards are how retired skills keep receiving traffic long after a vulnerability is patched.

Enterprise internal networks vs external partners

Internal agent meshes can rely on mTLS and private DNS. Partner agents need explicit federation rules, contractually scoped skills, and stronger artifact inspection because you do not control their runtime.

Delegation Across Agent Boundaries

Delegation is where A2A security is won or lost. When Agent A sends a task to Agent B, three questions must have crisp answers:

Whose authority is being exercised? The user’s, A’s service account, or a blended delegated token?
What is B allowed to do with that authority? Read-only analysis, or mutating tools on A’s behalf?
Who is accountable if B exceeds scope? A, B, the gateway policy, or the human who approved an unclear prompt?

Propagating user intent vs over-delegation

Pass signed delegation claims that include user ID, original task ID, allowed skills, expiry, and maximum hop count. Downstream agents must reject tasks that expand scope silently. If B needs higher privilege than A held, transition to input_required and obtain explicit human approval rather than upgrading tokens invisibly.

Human-in-the-loop approval flows for risky delegation are covered in A2A Streaming and Async Tasks for Long-Running Agent Workflows where input_required is a first-class task state rather than an error.

sequenceDiagram participant User participant Orch as Orchestrator agent participant GW as A2A gateway participant Spec as Specialist agent participant MCP as MCP tool server User->>Orch: Request with user token Orch->>GW: Delegate task (scoped delegation token) GW->>GW: Policy check scope + hop count GW->>Spec: Forward task (reduced scope token) Spec->>MCP: Tool call (tool-scoped credential) MCP->>MCP: Enforce allowlist + user context Spec-->>GW: Artifact + audit events GW-->>Orch: Task update Orch-->>User: Final response

Separate reasoning from execution permissions

An agent may need broad read access to plan while write tools sit behind approval. Split credentials or use distinct MCP profiles for planning vs execution so a model mistake cannot immediately mutate production.

Audit Trails and Answer Provenance

If you cannot reconstruct a delegation chain, you cannot explain an incident, pass an audit, or dispute a billing anomaly.

Log at three layers:

Gateway: authentication result, policy decision, routed agent ID, task ID, parent task ID, rate-limit events.

Agent: task state transitions, messages sent/received, model/tool invocations (arguments redacted as needed), artifacts created, delegation outward.

MCP server: tool name, caller agent ID, user context, success/failure, latency, rows affected or resource IDs (policy permitting).

Correlate with trace ID across all layers. Observability for LLM Systems covers instrumentation backends; this article defines what must be captured so those backends have meaningful signal.

Final answer provenance should answer: which user, which orchestrator task, which specialist agents, which tools, which artifacts influenced the text the user saw, and which policy gates fired along the way.

Runtime Policy, Egress, and Secrets

Runtime policy engines (OPA, Cedar, custom rule services) evaluate structured events: “tool X with args Y for user Z.” They complement guardrails because they do not depend on the model behaving well.

Human approval belongs in runtime policy for irreversible or high-cost actions: payments, external email, production config changes, privilege grants.

Egress controls limit which domains MCP servers and agents may call. An agent that can both read secrets and POST to arbitrary URLs is a data-loss waiting to happen.

Secrets never belong in Agent Cards or prompts. MCP hosts should inject short-lived credentials at execution time from a secrets manager. For transport encryption, key management, and baseline infra security patterns, see Architectural Patterns for Securing Data.

Push notification webhooks in async A2A flows need the same rigor: verify sender identity, reject stale events, and never treat a webhook payload as authorization on its own.

Reference Security Architecture

The following diagram summarizes a production-oriented layout for A2A outside, MCP inside deployments at scale.

flowchart TB subgraph Client layer U[User / API client] end subgraph Control plane GW[A2A gateway] REG[Agent registry] POL[Policy engine] AUD[Audit log] SEC[Secrets manager] end subgraph Agent layer OR[Orchestrator] SA[Specialist agents] end subgraph Tool layer MG[MCP gateway] MCP[MCP servers] end subgraph Observability OBS[Tracing + metrics] end U --> GW GW --> REG GW --> POL GW --> OR OR --> GW GW --> SA SA --> MG MG --> MCP POL --> GW POL --> MG SEC --> SA SEC --> MCP GW --> AUD MG --> AUD SA --> AUD AUD --> OBS

The orchestrator sees specialist agents through A2A. Specialists see tools through MCP. Users never receive raw MCP credentials, and partners never receive internal skill surfaces without policy review.

For protocol concepts (Agent Cards, tasks, artifacts), see What Is the A2A Protocol?. For adoption and enterprise framing, see Google A2A Protocol in 2026. For topology when many agents coordinate, see Multi-Agent Orchestration Patterns.

Production Checklist for A2A and MCP Security

Before exposing agent protocols beyond a trusted sandbox, verify:

Identity and auth

No anonymous agents in production paths
Every MCP and A2A call authenticated on every request
OAuth scopes or equivalent mapped to skills/tools, not blanket admin
Public vs authenticated Agent Card views defined intentionally

Delegation and policy

Delegation tokens carry user ID, task ID, scope, expiry, hop limit
Downstream agents reject scope expansion without explicit approval
High-risk tools require runtime policy or human approval
Reasoning and execution use separate credentials where possible

Discovery and registry

Agent registry entries have owners and version history
Agent Cards refreshed on TTL; signatures verified where supported
Partner agents federated with explicit skill allowlists

Audit and observability

Gateway, agent, and MCP layers emit correlated audit events
Delegation chains logged with parent and child task IDs
Artifact provenance recorded for final answers
Trace IDs connect to observability backends

Abuse and resilience

Rate limits per user, agent, and tenant
Timeout policies on delegated tasks
Egress allowlists on tool servers
Secrets in a manager, not in cards, prompts, or repos

Conclusion

A2A and MCP interoperability is powerful because agents and tools can compose across team and vendor boundaries, but that power is unsafe without identity, authorization, delegation limits, and audit design. Guardrails protect the model conversation; protocol security protects the actions agents take on behalf of users.

Treat Agent Cards as advertisements, delegation as a signed contract, MCP tools as privileged code execution, and audit logs as the evidence chain you will need when something interesting happens at 2 a.m.

Build the gateway when governance needs a single throat to choke. Split credentials before you split agents. Log every hop so the answer “the model decided” is never the final incident report.

Frequently Asked Questions

What is the difference between LLM guardrails and A2A MCP agent security? Guardrails constrain model input and output. Protocol security constrains who may invoke tools, delegate tasks, and act on whose behalf across MCP and A2A with identity, authorization, and audit trails.

How should agent identity work in an A2A deployment? Separate human, agent service, and task identities. Validate credentials on every request, use scoped tokens, and treat Agent Cards as discovery metadata rather than proof of trust.

What is the confused deputy problem in multi-agent systems? It occurs when a privileged agent or tool performs a sensitive action because a less privileged caller smuggled instructions through delegation or artifacts. Enforce scope at every hop.

Do you need an A2A gateway in production? Single-team internal deployments may enforce policy per agent. Multi-tenant, multi-vendor, or partner-facing networks usually need a gateway for centralized auth, routing, rate limits, and audit.

What should an A2A MCP audit log contain? User ID, agent ID, task ID, parent task ID, tool calls, policy decisions, artifacts, and timestamps correlated with trace IDs across gateway, agent, and MCP layers.

Sources

A2A Protocol – Enterprise-ready security topics: https://github.com/a2aproject/A2A/blob/main/docs/topics/enterprise-ready.md
A2A Protocol – Specification overview: https://a2a-protocol.org/latest/specification/
A2A Protocol – Streaming and push notification security: https://a2a-protocol.org/latest/topics/streaming-and-async/