Model Hierarchy
Four model tiers built from the same corpus, filtered by color domain. Same data, different edges.
The Four Models
BlackRainbow (Full Spectrum)
The complete model. All 11 colors, all 63,174 pairs. Knows offense, defense, infrastructure, governance, and everything between. This is the general-purpose operator copilot.
Color filter: ALL Use case: Full-spectrum engagements, purple team operations, training pipeline validation.
Shinobit (Attack)
The blade. Filtered to offensive colors only. Leaner, faster, sharper on attack chains. No defensive knowledge, no governance. Pure offense.
Color filter: Red, Orange, Yellow Use case: Penetration testing, red team operations, exploit development, credential attacks.
Onibit (Detect)
The shield. Filtered to defensive and governance colors. Detection engineering, log analysis, incident response. Sees what Shinobit does, from the other side.
Color filter: Blue, Grey Use case: Detection rule authoring, threat hunting, SOC operations, compliance mapping.
Immortal Blade (Purple Team)
The flip. Red and Blue combined, no noise from other domains. Understands both the attack and the detection. Purpose-built for purple team exercises where the operator needs to think in both directions simultaneously.
Color filter: Red + Blue (combined, not separated) Use case: Purple team engagements, detection gap analysis, adversary emulation with detection validation.
Base Model
All four tiers start from Qwen2.5-7B-Instruct.
Selection criteria:
- 7B parameter count fits in 32GB VRAM with QLoRA overhead
- Strong instruction-following baseline
- Permissive license for commercial fine-tuning
- Good performance on code and technical content
- 128K context window
Training Method
QLoRA (Quantized Low-Rank Adaptation):
| Parameter | Value |
|---|---|
| Rank (r) | 64 |
| Alpha | 128 |
| Dropout | 0.05 |
| Target modules | All linear layers |
| Quantization | 4-bit NormalFloat (NF4) |
| Optimizer | AdamW |
| Learning rate | 2e-4 |
| Scheduler | Cosine |
| Epochs | 3 |
| Batch size | Per-GPU micro-batch 1, gradient accumulation 8 |
| Max sequence length | 4096 |
The r=64/alpha=128 configuration (alpha = 2x rank) provides strong adaptation without catastrophic forgetting of the base model's general capabilities.
Training Hardware
| Node | GPU | VRAM | Role |
|---|---|---|---|
| gpu-node-1 | RTX 5090 | 32GB | Primary training, QLoRA fine-tuning |
| gpu-node-2 | RTX 5090 | 32GB | Parallel training, evaluation runs |
| mlx-node | M4 Pro | 64GB unified | MLX training for 7B-13B models |
Typical training run for the full BlackRainbow model (63K pairs, 3 epochs) takes ~10 hours on a single RTX 5090.
Deployment
GGUF Quantization
After training, LoRA adapters are merged back into the base model, then quantized to GGUF format for Ollama deployment:
| Quantization | Size | Quality | Use Case |
|---|---|---|---|
| Q5_K_M | ~5.1GB | Higher fidelity | Primary inference, evaluation |
| Q4_K_M | ~4.4GB | Good balance | Fast inference, resource-constrained |
Ollama Deployment
Models are deployed as Ollama modelfiles:
FROM ./blackrainbow-v08.Q5_K_M.gguf
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
SYSTEM """You are BlackRainbow, a security assurance domain expert..."""
The model section in blackrainbow.yaml controls which model the framework uses:
model:
provider: ollama
model: blackrainbow-v08
host: http://localhost:11434
temperature: 0.3
Version History
Models are versioned sequentially. Each version represents a corpus expansion, hyperparameter change, or base model upgrade. The corpus grows monotonically. Models are retrained from scratch on each version, not incrementally fine-tuned.
v01 → v02 → ... → v08 (current)
Every version is validated by the operator before deployment. No model ships without human evaluation.