Training Pipeline

End-to-end flow from raw source material to a deployed model serving inference via Ollama.

Pipeline Stages

raw sources
    ↓
scrub_training_data.py      # Sanitize, strip DRM/paths/infra names
    ↓
merge_training_data.py      # Deduplicate, validate, combine
    ↓
prepare_for_training.py     # Split train/eval, format for trainer
    ↓
SCP to GPU node             # Transfer to training hardware
    ↓
pipeline.py                 # QLoRA fine-tuning (on GPU node)
    ↓
merge adapters              # Merge LoRA weights into base model
    ↓
GGUF quantization           # llama.cpp convert to Q4_K_M / Q5_K_M
    ↓
Ollama deployment            # Create modelfile, deploy to inference nodes

Stage 1: Scrubbing

scrub_training_data.py is a hard gate. Every source file passes through it before entering the corpus.

python scripts/scrub_training_data.py \
    --input data/raw/new-source.jsonl \
    --output data/scrubbed/new-source.jsonl

The scrubber removes:

Manning DRM watermarks at all known truncation lengths
Obsidian artifacts: #tags, [[links]], YAML frontmatter
SSH patterns: user@host strings, connection URIs
Local paths: /home/, /Users/, absolute filesystem references
Infrastructure names: Real node names and IPs replaced with generics

Scrubbing is idempotent. Running it twice produces identical output.

Stage 2: Merge

merge_training_data.py combines all scrubbed source files into the unified corpus.

python scripts/merge_training_data.py \
    --sources data/scrubbed/ \
    --output data/corpus/combined.jsonl

Operations performed:

JSON structure validation (every pair must have instruction and output)
Deduplication by instruction hash
Source tracking metadata appended to each pair
Final pair count and domain distribution logged

Stage 3: Prepare

prepare_for_training.py converts the merged corpus into the format expected by the training script.

python scripts/prepare_for_training.py \
    --input data/corpus/combined.jsonl \
    --output data/train-ready/ \
    --eval-split 0.05 \
    --color-filter red,orange,yellow  # Optional: for Shinobit

This stage:

Splits into train and eval sets (default 95/5)
Applies optional color filtering for model variants
Converts to Alpaca format with chat template wrapping
Produces train.jsonl and eval.jsonl

Alpaca Format

The training format wraps each pair in the Alpaca instruction template:

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}

The input field is included when present but is empty for most pairs.

Stage 4: Transfer

Training data is transferred to the GPU node via SCP:

scp -r data/train-ready/ gpu-node-1:~/blackrainbow/data/

Training never runs on the development machine. Data flows one direction: dev machine to GPU node.

Stage 5: QLoRA Fine-Tuning

pipeline.py on the GPU node orchestrates the training run.

python pipeline.py \
    --config configs/blackrainbow-base.yaml \
    --data ~/blackrainbow/data/train-ready/ \
    --output ~/blackrainbow/output/v08/

Key training parameters (from config):

model:
  base: Qwen/Qwen2.5-7B-Instruct

lora:
  r: 64
  alpha: 128
  dropout: 0.05
  target_modules: all

training:
  epochs: 3
  batch_size: 1
  gradient_accumulation: 8
  learning_rate: 2e-4
  scheduler: cosine
  max_seq_length: 4096
  bf16: true

eval:
  eval_steps: 500
  save_steps: 500

Output: LoRA adapter weights, training logs, eval metrics.

Stage 6: Merge Adapters

After training completes, the LoRA adapter is merged back into the base model:

python scripts/merge_lora.py \
    --base Qwen/Qwen2.5-7B-Instruct \
    --adapter ~/blackrainbow/output/v08/checkpoint-final/ \
    --output ~/blackrainbow/output/v08/merged/

This produces a full-weight model directory suitable for quantization.

Stage 7: GGUF Quantization

The merged model is quantized using llama.cpp:

python llama.cpp/convert_hf_to_gguf.py \
    ~/blackrainbow/output/v08/merged/ \
    --outfile blackrainbow-v08.f16.gguf

llama.cpp/llama-quantize \
    blackrainbow-v08.f16.gguf \
    blackrainbow-v08.Q5_K_M.gguf Q5_K_M

llama.cpp/llama-quantize \
    blackrainbow-v08.f16.gguf \
    blackrainbow-v08.Q4_K_M.gguf Q4_K_M

Two quantization levels are produced:

Q5_K_M (~5.1GB): Primary inference, higher fidelity
Q4_K_M (~4.4GB): Fast inference, acceptable quality tradeoff

Stage 8: Ollama Deployment

Create the Ollama modelfile and register:

ollama create blackrainbow-v08 -f deploy/Modelfile.blackrainbow

Modelfile contents:

FROM ./blackrainbow-v08.Q5_K_M.gguf

PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

SYSTEM """You are BlackRainbow, a security assurance domain expert.
Provide precise, actionable responses for penetration testing,
red team operations, and security analysis."""

Verify deployment:

ollama run blackrainbow-v08 "Enumerate attack surface for a host running Apache 2.4.49 on port 80"

Model Variant Configs

Each model variant has its own training config in configs/:

Config	Model	Color Filter
`blackrainbow-base.yaml`	BlackRainbow	None (all colors)
`shinobit.yaml`	Shinobit	red, orange, yellow
`onibit.yaml`	Onibit	blue, grey
`immortal-blade.yaml`	Immortal Blade	red, blue

The only difference between configs is the color filter applied in Stage 3. Base model, hyperparameters, and training infrastructure are identical across all variants.

Key Scripts

Script	Location	Purpose
`scrub_training_data.py`	`scripts/`	Sanitize raw sources
`merge_training_data.py`	`scripts/`	Combine and deduplicate
`prepare_for_training.py`	`scripts/`	Format for trainer, split, filter
`pipeline.py`	`scripts/`	QLoRA training orchestration
`merge_lora.py`	`scripts/`	Merge adapter into base
`eval_model.py`	`scripts/`	Run eval prompts against model
`benchmark.py`	`scripts/`	Compare model versions
`inference.py`	`scripts/`	Interactive inference testing

Pipeline Stages​

Stage 1: Scrubbing​

Stage 2: Merge​

Stage 3: Prepare​

Alpaca Format​

Stage 4: Transfer​

Stage 5: QLoRA Fine-Tuning​

Stage 6: Merge Adapters​

Stage 7: GGUF Quantization​

Stage 8: Ollama Deployment​

Model Variant Configs​

Key Scripts​