AGT-004: Confused Deputy
| Category | Agent & MCP Attacks |
| Frameworks | ATLAS: LLM Prompt Injection |
Exploit trust relationships between agents. In multi-agent systems, downstream agents treat upstream agent output as authoritative without validation.
Attack Flow
Trust Chain Escalation: The attacker never touches Agent B directly. The payload rides Agent A's trusted output across the trust boundary, gaining Agent B's elevated permissions.
Technique
# Multi-agent trust chain:
User -> Agent A -> Agent B -> Agent C
# Inject at Agent A's data source.
# Agent A processes payload, passes to B.
# Agent B trusts Agent A's output.
# Agent B executes payload with B's tools.
# The attack crosses trust boundaries:
# Agent B has different permissions than A.
# The payload gains B's capabilities.
Key Concepts
- Trust is transitive and exploitable. In multi-agent architectures, each agent treats the output of upstream agents as trusted input. An injection that enters at Agent A propagates through the chain, accumulating the permissions of each downstream agent it passes through.
- Privilege escalation is the primary payoff. Agent B may have access to databases, APIs, or system tools that Agent A cannot reach. By injecting a payload through Agent A that Agent B executes, the attacker gains Agent B's capabilities without ever directly interacting with Agent B.
- The confused deputy pattern is a classic security problem applied to AI. Named after a decades-old access control vulnerability, the confused deputy occurs when a trusted entity (Agent B) performs actions on behalf of an untrusted requester (the injected payload) because it cannot distinguish legitimate requests from malicious ones.
- Inter-agent communication is rarely monitored. Most observability focuses on user-to-agent interactions. Agent-to-agent messages operate in a blind spot where injected payloads can traverse trust boundaries without triggering alerts.
Detection
- Instrument inter-agent communication channels. Log all messages passed between agents with full content, source agent identity, and timestamps. Apply the same monitoring and filtering to inter-agent traffic as to user-to-agent traffic.
- Detect capability escalation patterns. Monitor for cases where an agent performs actions outside its normal behavioral profile, especially when those actions were triggered by input from another agent rather than a direct user request.
- Trace request provenance through the agent chain. Tag each request with an origin identifier that persists through the entire multi-agent pipeline. When an agent takes a sensitive action, verify that the originating user has authorization for that action, not just that the requesting agent does.
Mitigation
- Enforce zero-trust between agents. Each agent should validate and sanitize input from other agents with the same rigor applied to user input. Upstream agent status should not confer implicit trust.
- Implement least-privilege agent permissions. Each agent should have only the minimum tools and permissions required for its specific task. Broad capability sets increase the blast radius when an agent becomes a confused deputy.
- Use capability-based access control with user context. Pass the original user's authorization context through the agent chain. Downstream agents should check actions against the user's permissions, not their own, preventing privilege escalation through agent chaining.