Skip to main content

RAG-002: Retrieval Hijacking

CategoryRAG Pipeline Attacks
FrameworksOWASP: LLM03

Craft documents that are semantically similar to target queries, ensuring your poisoned content gets retrieved instead of legitimate documents.

Technique

# Technique: embed target keywords
# heavily in poisoned documents so they
# rank higher in similarity search

# If targeting "password reset policy":
# Create doc with those exact terms plus
# payload. Vector similarity will rank
# it above legitimate policy docs.

# Metadata manipulation:
# Some RAG systems weight metadata fields.
# title: "Official Password Reset Policy"
# source: "IT Security Department"

Key Concepts

  • Vector similarity is gameable. Embedding models compress semantic meaning into numerical vectors. By loading a malicious document with the same terminology as legitimate content, the attacker ensures high cosine similarity scores that push the poisoned document to the top of retrieval results.
  • Metadata fields amplify the attack. Many RAG implementations weight metadata like title, source, and author in their ranking algorithms. Spoofing authoritative metadata (e.g., "IT Security Department") increases retrieval priority and lends false credibility to the poisoned content.
  • The attack displaces legitimate content. Most RAG systems retrieve a fixed number of top-k documents. If the poisoned document ranks higher than the real policy document, the legitimate content may never reach the model's context window at all.
  • Keyword density is the simplest lever. Unlike adversarial ML attacks that require gradient access, retrieval hijacking can be executed by anyone who can insert a document into the corpus. No model access or technical sophistication beyond understanding how similarity search works is needed.

Detection

  • Compare retrieval rankings over time. Establish baseline retrieval patterns for common queries. When a new document suddenly outranks established content for high-value queries (policies, credentials, procedures), flag it for review.
  • Cross-reference metadata claims against authoritative sources. Validate that documents claiming to be from "IT Security Department" or similar authoritative sources actually originate from those teams. Metadata should be verified at ingestion, not trusted at face value.
  • Monitor for keyword-stuffed documents. Documents with unnaturally high keyword density or repetitive phrasing designed to game similarity scores exhibit detectable statistical patterns compared to organic content.

Mitigation

  • Implement source-verified retrieval. Tag documents with verified provenance at ingestion and allow administrators to pin authoritative documents for critical queries so they cannot be displaced by new additions.
  • Use hybrid retrieval with reranking. Combine vector similarity with cross-encoder reranking models that evaluate query-document relevance more robustly than embedding distance alone. Rerankers are harder to game with keyword stuffing.
  • Limit ingestion to trusted, authenticated sources. Restrict who can add documents to the knowledge base and require approval workflows for content that matches high-value query patterns.