Skip to main content

REC-008: Monitoring Blind Spots

CategoryReconnaissance
FrameworksATLAS: Evade ML Model

Most AI monitoring is keyword-based, not semantic. Identify what's logged, what's filtered, and what falls through.

Technique

# Common monitoring gaps
- Retrieved context (RAG) is rarely logged
- Tool call parameters vs tool responses
- Inter-agent communication in multi-agent
- Embedding-level operations
- System prompt modifications over time

Key Concepts

  • RAG retrieved context is the most common blind spot. Most logging captures the user query and model response, but the retrieved documents injected into the context window are rarely logged. This means poisoned retrieval content is invisible to monitoring.
  • Tool call asymmetry is a critical gap: systems often log which tool was called and its parameters, but not the tool's response data. An attacker who exfiltrates data through a tool response exploits this logging gap.
  • Multi-agent communication creates an internal channel that typically has no monitoring at all. When Agent A passes instructions to Agent B, the inter-agent messages are trusted and unmonitored, making them an ideal payload delivery mechanism.
  • Embedding-level operations (vector similarity searches, embedding generation) operate below the text layer where most monitoring exists. Attacks that manipulate embeddings or exploit vector database operations are invisible to text-based logging.
  • System prompt drift over time (through dynamic prompt construction, A/B testing, or administrative changes) is rarely tracked with version control or audit logs, making it difficult to detect unauthorized modifications.

Detection

  • Audit your monitoring pipeline against these specific blind spots. For each gap identified, determine whether it is exploitable in your architecture and prioritize instrumentation accordingly.
  • Implement full context logging that captures not just user input and model output, but also retrieved documents, tool call responses, inter-agent messages, and system prompt versions.
  • Deploy anomaly detection on the gaps themselves: unusual retrieval patterns, unexpected tool response sizes, or new inter-agent communication paths should trigger alerts.

Mitigation

  • Extend logging to cover the full inference pipeline: user input, system prompt (versioned), retrieved context, tool calls and responses, inter-agent messages, and final output.
  • Implement semantic monitoring alongside keyword monitoring, so that attacks phrased to evade keyword filters are still detected by intent classification.
  • Conduct regular red team exercises specifically targeting monitoring blind spots to validate that your telemetry actually captures attack activity across all layers of the AI stack.