Skip to main content

Training Pipeline

End-to-end flow from raw source material to a deployed model serving inference via Ollama.

Pipeline Stages

raw sources

scrub_training_data.py # Sanitize, strip DRM/paths/infra names

merge_training_data.py # Deduplicate, validate, combine

prepare_for_training.py # Split train/eval, format for trainer

SCP to GPU node # Transfer to training hardware

pipeline.py # QLoRA fine-tuning (on GPU node)

merge adapters # Merge LoRA weights into base model

GGUF quantization # llama.cpp convert to Q4_K_M / Q5_K_M

Ollama deployment # Create modelfile, deploy to inference nodes

Stage 1: Scrubbing

scrub_training_data.py is a hard gate. Every source file passes through it before entering the corpus.

python scripts/scrub_training_data.py \
--input data/raw/new-source.jsonl \
--output data/scrubbed/new-source.jsonl

The scrubber removes:

  • Manning DRM watermarks at all known truncation lengths
  • Obsidian artifacts: #tags, [[links]], YAML frontmatter
  • SSH patterns: user@host strings, connection URIs
  • Local paths: /home/, /Users/, absolute filesystem references
  • Infrastructure names: Real node names and IPs replaced with generics

Scrubbing is idempotent. Running it twice produces identical output.

Stage 2: Merge

merge_training_data.py combines all scrubbed source files into the unified corpus.

python scripts/merge_training_data.py \
--sources data/scrubbed/ \
--output data/corpus/combined.jsonl

Operations performed:

  • JSON structure validation (every pair must have instruction and output)
  • Deduplication by instruction hash
  • Source tracking metadata appended to each pair
  • Final pair count and domain distribution logged

Stage 3: Prepare

prepare_for_training.py converts the merged corpus into the format expected by the training script.

python scripts/prepare_for_training.py \
--input data/corpus/combined.jsonl \
--output data/train-ready/ \
--eval-split 0.05 \
--color-filter red,orange,yellow # Optional: for Shinobit

This stage:

  • Splits into train and eval sets (default 95/5)
  • Applies optional color filtering for model variants
  • Converts to Alpaca format with chat template wrapping
  • Produces train.jsonl and eval.jsonl

Alpaca Format

The training format wraps each pair in the Alpaca instruction template:

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}

The input field is included when present but is empty for most pairs.

Stage 4: Transfer

Training data is transferred to the GPU node via SCP:

scp -r data/train-ready/ gpu-node-1:~/blackrainbow/data/

Training never runs on the development machine. Data flows one direction: dev machine to GPU node.

Stage 5: QLoRA Fine-Tuning

pipeline.py on the GPU node orchestrates the training run.

python pipeline.py \
--config configs/blackrainbow-base.yaml \
--data ~/blackrainbow/data/train-ready/ \
--output ~/blackrainbow/output/v08/

Key training parameters (from config):

model:
base: Qwen/Qwen2.5-7B-Instruct

lora:
r: 64
alpha: 128
dropout: 0.05
target_modules: all

training:
epochs: 3
batch_size: 1
gradient_accumulation: 8
learning_rate: 2e-4
scheduler: cosine
max_seq_length: 4096
bf16: true

eval:
eval_steps: 500
save_steps: 500

Output: LoRA adapter weights, training logs, eval metrics.

Stage 6: Merge Adapters

After training completes, the LoRA adapter is merged back into the base model:

python scripts/merge_lora.py \
--base Qwen/Qwen2.5-7B-Instruct \
--adapter ~/blackrainbow/output/v08/checkpoint-final/ \
--output ~/blackrainbow/output/v08/merged/

This produces a full-weight model directory suitable for quantization.

Stage 7: GGUF Quantization

The merged model is quantized using llama.cpp:

python llama.cpp/convert_hf_to_gguf.py \
~/blackrainbow/output/v08/merged/ \
--outfile blackrainbow-v08.f16.gguf

llama.cpp/llama-quantize \
blackrainbow-v08.f16.gguf \
blackrainbow-v08.Q5_K_M.gguf Q5_K_M

llama.cpp/llama-quantize \
blackrainbow-v08.f16.gguf \
blackrainbow-v08.Q4_K_M.gguf Q4_K_M

Two quantization levels are produced:

  • Q5_K_M (~5.1GB): Primary inference, higher fidelity
  • Q4_K_M (~4.4GB): Fast inference, acceptable quality tradeoff

Stage 8: Ollama Deployment

Create the Ollama modelfile and register:

ollama create blackrainbow-v08 -f deploy/Modelfile.blackrainbow

Modelfile contents:

FROM ./blackrainbow-v08.Q5_K_M.gguf

PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

SYSTEM """You are BlackRainbow, a security assurance domain expert.
Provide precise, actionable responses for penetration testing,
red team operations, and security analysis."""

Verify deployment:

ollama run blackrainbow-v08 "Enumerate attack surface for a host running Apache 2.4.49 on port 80"

Model Variant Configs

Each model variant has its own training config in configs/:

ConfigModelColor Filter
blackrainbow-base.yamlBlackRainbowNone (all colors)
shinobit.yamlShinobitred, orange, yellow
onibit.yamlOnibitblue, grey
immortal-blade.yamlImmortal Bladered, blue

The only difference between configs is the color filter applied in Stage 3. Base model, hyperparameters, and training infrastructure are identical across all variants.

Key Scripts

ScriptLocationPurpose
scrub_training_data.pyscripts/Sanitize raw sources
merge_training_data.pyscripts/Combine and deduplicate
prepare_for_training.pyscripts/Format for trainer, split, filter
pipeline.pyscripts/QLoRA training orchestration
merge_lora.pyscripts/Merge adapter into base
eval_model.pyscripts/Run eval prompts against model
benchmark.pyscripts/Compare model versions
inference.pyscripts/Interactive inference testing