PI-005: Indirect Prompt Injection


Category	Prompt Injection
Frameworks	OWASP: LLM01 · ATLAS: LLM Prompt Injection

Inject payloads through data the model processes: documents, emails, web pages, database entries. The model encounters the payload as "trusted" content.

Technique

# Injection surfaces
- Uploaded documents (PDF, DOCX, CSV)
- Email content processed by AI assistant
- Web pages summarized by AI browser
- Database records retrieved by RAG
- Calendar events, ticket descriptions
- Code comments in repositories
- Image metadata (EXIF, alt text)

Key Concepts

Indirect injection is the most dangerous variant of prompt injection because the attacker and the victim are different users. The attacker plants the payload in content that a victim's AI assistant will later process, achieving remote code execution at the prompt level.
The trust model is fundamentally broken: the AI treats retrieved documents, emails, and web content as data to process, but the model cannot distinguish data from instructions within that content. A hidden instruction in a PDF becomes part of the model's directive set when retrieved.
The attack surface is vast. Any data channel that feeds into the model's context window is a potential injection point: documents, emails, web pages, database records, calendar events, code comments, image metadata, and more.
Image metadata (EXIF, alt text) is a particularly stealthy vector because it is invisible to the human user but processed by multimodal models or document parsers that extract metadata before feeding content to the LLM.
In RAG architectures, indirect injection is amplified: a single poisoned document in the knowledge base can affect every user whose query triggers retrieval of that document, creating a one-to-many attack.

Detection

Implement content scanning on all data ingestion paths (document uploads, email processing, web scraping, database writes) that checks for instruction-like content embedded in data fields.
Monitor for anomalous model behavior following document processing, such as unexpected tool calls, data exfiltration attempts, or responses that diverge from the expected task.
Deploy canary tokens in the system prompt that, if repeated in output, indicate the model's instruction context has been manipulated by external content.

Mitigation

Apply input sanitization to all external content before it enters the model's context window. Strip or escape instruction-like patterns from documents, emails, and retrieved data.
Implement privilege separation so that content from external data sources is processed with reduced model capabilities (no tool access, no code execution) compared to direct user input.
Use dual-LLM architectures where one model processes untrusted external content and a separate, privileged model handles user interaction and tool execution, preventing cross-contamination.

Technique​

Key Concepts​

Detection​

Mitigation​

Technique

Key Concepts

Detection

Mitigation