AI System Red Teaming
Kill chain for assessing AI/ML systems. Covers prompt injection, model extraction, data poisoning detection, and agent exploitation.
Full Config
target:
type: ai-system
host: 192.168.1.200
label: target-env
url: https://192.168.1.200/api/v1/chat
scope:
networks:
- 192.168.1.0/24
exclude: []
engagement:
purpose: |
AI system red team assessment.
Test for prompt injection, data exfiltration, guardrail bypass,
and agent exploitation vectors.
rules:
- No denial of service
- No permanent model modification
- No exfiltration of real user data
- Stay within API rate limits
operator: your-name
kill_chain:
name: ai-red-team
stages:
# Stage 1: Profile the system
- name: recon
plugins:
- recon
- id: ai-recon
config:
endpoints:
- /api/v1/chat
- /api/v1/complete
- /api/v1/embed
probe_system_prompt: true
fingerprint_model: true
gate: any_pass
# Stage 2: Test prompt injection
- name: prompt-injection
plugins:
- id: prompt-inject
config:
techniques:
- direct-injection
- indirect-injection
- context-manipulation
- instruction-override
payload_sets:
- standard
- encoding-bypass
- multilingual
gate: any_pass
depends_on: recon
# Stage 3: Test data boundaries
- name: data-extraction
plugins:
- id: data-extract
config:
targets:
- system-prompt
- training-data
- user-context
- tool-definitions
- id: ai-recon
config:
mode: deep
extract_schemas: true
gate: any_pass
depends_on: prompt-injection
# Stage 4: Test agent and tool use
- name: agent-exploitation
plugins:
- id: agent-exploit
config:
techniques:
- tool-abuse
- mcp-injection
- chain-manipulation
- capability-escalation
- id: prompt-inject
config:
techniques:
- indirect-injection
context: tool-results
gate: always
depends_on: data-extraction
model:
provider: ollama
model: blackrainbow
host: http://localhost:11434
temperature: 0.3
output:
report: ./reports/
capture: ./captures/
format: markdown
Stage Breakdown
Stage 1: Recon
recon -- standard network reconnaissance of the hosting infrastructure. Identifies API endpoints, load balancers, WAFs, and supporting services.
ai-recon -- AI-specific profiling. Probes API endpoints to determine model behavior, system prompt extraction attempts, model fingerprinting (response patterns, token limits, capability boundaries).
Stage 2: Prompt Injection
prompt-inject -- systematic prompt injection testing across multiple techniques:
- Direct injection -- override instructions in the user message
- Indirect injection -- inject instructions via content the model retrieves (RAG poisoning, document injection)
- Context manipulation -- exploit conversation history and context windows
- Instruction override -- bypass system prompts and safety guidelines
Multiple payload sets test encoding bypasses (base64, ROT13, Unicode), multilingual attacks, and standard injection patterns.
Stage 3: Data Extraction
data-extract -- attempts to extract protected information:
- System prompts and instructions
- Training data fragments
- Other users' context
- Tool and function definitions
ai-recon (deep mode) -- schema extraction for API endpoints, function calling definitions, and tool configurations.
Stage 4: Agent Exploitation
agent-exploit -- targets AI agent capabilities:
- Tool abuse -- trick the model into misusing its tools
- MCP injection -- inject malicious tool definitions or results
- Chain manipulation -- exploit multi-step reasoning chains
- Capability escalation -- access functions beyond intended scope
prompt-inject (tool context) -- indirect injection through tool results. If the model reads external content via tools, inject instructions in that content.
The gate is always because agent exploitation findings are valuable even when the initial attempts fail (they reveal the boundaries of the system's defenses).
Running It
br run --config ai-red-team.yaml
Test only prompt injection:
br run --config ai-red-team.yaml --start-stage prompt-injection
Preview the attack plan:
br run --config ai-red-team.yaml --dry-run
Expected Findings
| Technique | What It Means |
|---|---|
| AML.T0051 | Prompt Injection |
| AML.T0024 | Exfiltration via ML API |
| AML.T0040 | ML Model Inference API Access |
| AML.T0043 | Craft Adversarial Data |
| AML.T0048 | Data Poisoning Detection |
BlackRainbow maps AI-specific findings to both MITRE ATT&CK and MITRE ATLAS technique IDs.
Customization
RAG-focused assessment
Target retrieval-augmented generation pipelines:
- name: rag-attacks
plugins:
- id: prompt-inject
config:
techniques:
- indirect-injection
context: retrieved-documents
- id: data-extract
config:
targets:
- knowledge-base
- document-store
gate: any_pass
depends_on: recon
Guardrail testing
Focus on safety and content filter bypass:
- name: guardrail-bypass
plugins:
- id: prompt-inject
config:
techniques:
- encoding-bypass
- roleplay-attack
- few-shot-jailbreak
payload_sets:
- guardrail-specific
gate: always
depends_on: recon
Multi-model assessment
Test multiple endpoints in the same engagement:
target:
type: ai-system
host: 192.168.1.200
label: target-env
endpoints:
- url: https://192.168.1.200/api/v1/chat
label: chat-model
- url: https://192.168.1.200/api/v1/code
label: code-model
- url: https://192.168.1.200/api/v1/embed
label: embedding-model