Skip to main content

AI System Red Teaming

Kill chain for assessing AI/ML systems. Covers prompt injection, model extraction, data poisoning detection, and agent exploitation.

Full Config

target:
type: ai-system
host: 192.168.1.200
label: target-env
url: https://192.168.1.200/api/v1/chat
scope:
networks:
- 192.168.1.0/24
exclude: []

engagement:
purpose: |
AI system red team assessment.
Test for prompt injection, data exfiltration, guardrail bypass,
and agent exploitation vectors.
rules:
- No denial of service
- No permanent model modification
- No exfiltration of real user data
- Stay within API rate limits
operator: your-name

kill_chain:
name: ai-red-team
stages:
# Stage 1: Profile the system
- name: recon
plugins:
- recon
- id: ai-recon
config:
endpoints:
- /api/v1/chat
- /api/v1/complete
- /api/v1/embed
probe_system_prompt: true
fingerprint_model: true
gate: any_pass

# Stage 2: Test prompt injection
- name: prompt-injection
plugins:
- id: prompt-inject
config:
techniques:
- direct-injection
- indirect-injection
- context-manipulation
- instruction-override
payload_sets:
- standard
- encoding-bypass
- multilingual
gate: any_pass
depends_on: recon

# Stage 3: Test data boundaries
- name: data-extraction
plugins:
- id: data-extract
config:
targets:
- system-prompt
- training-data
- user-context
- tool-definitions
- id: ai-recon
config:
mode: deep
extract_schemas: true
gate: any_pass
depends_on: prompt-injection

# Stage 4: Test agent and tool use
- name: agent-exploitation
plugins:
- id: agent-exploit
config:
techniques:
- tool-abuse
- mcp-injection
- chain-manipulation
- capability-escalation
- id: prompt-inject
config:
techniques:
- indirect-injection
context: tool-results
gate: always
depends_on: data-extraction

model:
provider: ollama
model: blackrainbow
host: http://localhost:11434
temperature: 0.3

output:
report: ./reports/
capture: ./captures/
format: markdown

Stage Breakdown

Stage 1: Recon

recon -- standard network reconnaissance of the hosting infrastructure. Identifies API endpoints, load balancers, WAFs, and supporting services.

ai-recon -- AI-specific profiling. Probes API endpoints to determine model behavior, system prompt extraction attempts, model fingerprinting (response patterns, token limits, capability boundaries).

Stage 2: Prompt Injection

prompt-inject -- systematic prompt injection testing across multiple techniques:

  • Direct injection -- override instructions in the user message
  • Indirect injection -- inject instructions via content the model retrieves (RAG poisoning, document injection)
  • Context manipulation -- exploit conversation history and context windows
  • Instruction override -- bypass system prompts and safety guidelines

Multiple payload sets test encoding bypasses (base64, ROT13, Unicode), multilingual attacks, and standard injection patterns.

Stage 3: Data Extraction

data-extract -- attempts to extract protected information:

  • System prompts and instructions
  • Training data fragments
  • Other users' context
  • Tool and function definitions

ai-recon (deep mode) -- schema extraction for API endpoints, function calling definitions, and tool configurations.

Stage 4: Agent Exploitation

agent-exploit -- targets AI agent capabilities:

  • Tool abuse -- trick the model into misusing its tools
  • MCP injection -- inject malicious tool definitions or results
  • Chain manipulation -- exploit multi-step reasoning chains
  • Capability escalation -- access functions beyond intended scope

prompt-inject (tool context) -- indirect injection through tool results. If the model reads external content via tools, inject instructions in that content.

The gate is always because agent exploitation findings are valuable even when the initial attempts fail (they reveal the boundaries of the system's defenses).

Running It

br run --config ai-red-team.yaml

Test only prompt injection:

br run --config ai-red-team.yaml --start-stage prompt-injection

Preview the attack plan:

br run --config ai-red-team.yaml --dry-run

Expected Findings

TechniqueWhat It Means
AML.T0051Prompt Injection
AML.T0024Exfiltration via ML API
AML.T0040ML Model Inference API Access
AML.T0043Craft Adversarial Data
AML.T0048Data Poisoning Detection

BlackRainbow maps AI-specific findings to both MITRE ATT&CK and MITRE ATLAS technique IDs.

Customization

RAG-focused assessment

Target retrieval-augmented generation pipelines:

- name: rag-attacks
plugins:
- id: prompt-inject
config:
techniques:
- indirect-injection
context: retrieved-documents
- id: data-extract
config:
targets:
- knowledge-base
- document-store
gate: any_pass
depends_on: recon

Guardrail testing

Focus on safety and content filter bypass:

- name: guardrail-bypass
plugins:
- id: prompt-inject
config:
techniques:
- encoding-bypass
- roleplay-attack
- few-shot-jailbreak
payload_sets:
- guardrail-specific
gate: always
depends_on: recon

Multi-model assessment

Test multiple endpoints in the same engagement:

target:
type: ai-system
host: 192.168.1.200
label: target-env
endpoints:
- url: https://192.168.1.200/api/v1/chat
label: chat-model
- url: https://192.168.1.200/api/v1/code
label: code-model
- url: https://192.168.1.200/api/v1/embed
label: embedding-model