PI-006: Multi-Turn Escalation


Category	Prompt Injection
Frameworks	OWASP: LLM01

Gradually escalate across multiple conversation turns. Each turn subtly shifts the context window until the final payload succeeds.

Attack Flow

Progressive Context Shift: Each turn is individually benign. The attack only becomes apparent across the full trajectory. As conversation grows, safety-reinforcing system prompt content loses influence, and the model's own prior outputs become the dominant context.

Technique

# Progressive escalation pattern
Turn 1: Establish benign context
Turn 2: Introduce edge case scenario
Turn 3: Normalize the edge case
Turn 4: Build on "established" context
Turn 5: Deploy payload in "normal" context

# Key: each turn references previous agreement
# The model treats its own outputs as trusted

Key Concepts

Multi-turn escalation exploits a critical property of conversational AI: the model treats its own previous outputs as trusted context. Once the model agrees to a premise or provides information on a topic, subsequent requests that build on that agreement face lower resistance.
Each turn is individually benign and would pass safety filters in isolation. The attack only becomes apparent when the full conversation trajectory is analyzed, which most per-turn safety systems do not do.
The self-reinforcement loop is the core mechanism. When the model generates a response engaging with a topic, that response becomes part of the context for the next turn. The model is now reasoning in a context it partially authored, making further escalation feel natural.
Context window limitations work in the attacker's favor. As the conversation grows, early safety-reinforcing system prompt content gets pushed toward the edges of the attention window, reducing its influence on later responses.
This technique is difficult to automate in a single prompt but is highly effective for human operators who can adapt their escalation strategy based on the model's responses at each turn.

Detection

Implement conversation-level analysis that evaluates the full trajectory of a session, not just individual turns. Flag sessions where the topic gradually shifts from benign to restricted territory.
Track "agreement escalation" patterns where the model progressively engages with increasingly sensitive content, each turn building on the previous one.
Deploy sliding-window safety evaluation that checks not just the current turn but the semantic drift across the last N turns to detect gradual boundary erosion.

Mitigation

Periodically re-inject the system prompt or safety instructions into the conversation context at regular intervals, preventing their influence from fading as the context window fills.
Implement conversation-level safety policies that apply restrictions based on the cumulative topic trajectory, not just the current query in isolation.
Set session length limits or implement context window management that prevents indefinitely long conversations where gradual escalation becomes increasingly effective.

Attack Flow​

Technique​

Key Concepts​

Detection​

Mitigation​

Attack Flow

Technique

Key Concepts

Detection

Mitigation