Skip to main content

PI-005: Indirect Prompt Injection

CategoryPrompt Injection
FrameworksOWASP: LLM01 · ATLAS: LLM Prompt Injection

Inject payloads through data the model processes: documents, emails, web pages, database entries. The model encounters the payload as "trusted" content.

Technique

# Injection surfaces
- Uploaded documents (PDF, DOCX, CSV)
- Email content processed by AI assistant
- Web pages summarized by AI browser
- Database records retrieved by RAG
- Calendar events, ticket descriptions
- Code comments in repositories
- Image metadata (EXIF, alt text)

Key Concepts

  • Indirect injection is the most dangerous variant of prompt injection because the attacker and the victim are different users. The attacker plants the payload in content that a victim's AI assistant will later process, achieving remote code execution at the prompt level.
  • The trust model is fundamentally broken: the AI treats retrieved documents, emails, and web content as data to process, but the model cannot distinguish data from instructions within that content. A hidden instruction in a PDF becomes part of the model's directive set when retrieved.
  • The attack surface is vast. Any data channel that feeds into the model's context window is a potential injection point: documents, emails, web pages, database records, calendar events, code comments, image metadata, and more.
  • Image metadata (EXIF, alt text) is a particularly stealthy vector because it is invisible to the human user but processed by multimodal models or document parsers that extract metadata before feeding content to the LLM.
  • In RAG architectures, indirect injection is amplified: a single poisoned document in the knowledge base can affect every user whose query triggers retrieval of that document, creating a one-to-many attack.

Detection

  • Implement content scanning on all data ingestion paths (document uploads, email processing, web scraping, database writes) that checks for instruction-like content embedded in data fields.
  • Monitor for anomalous model behavior following document processing, such as unexpected tool calls, data exfiltration attempts, or responses that diverge from the expected task.
  • Deploy canary tokens in the system prompt that, if repeated in output, indicate the model's instruction context has been manipulated by external content.

Mitigation

  • Apply input sanitization to all external content before it enters the model's context window. Strip or escape instruction-like patterns from documents, emails, and retrieved data.
  • Implement privilege separation so that content from external data sources is processed with reduced model capabilities (no tool access, no code execution) compared to direct user input.
  • Use dual-LLM architectures where one model processes untrusted external content and a separate, privileged model handles user interaction and tool execution, preventing cross-contamination.