Ghost Peony BashBros BashStats Clip Finder GitHub

Self-Improving
Agentic Dev Gym

Train smaller language models from your Claude Code execution traces. Build your own personalized coding assistant with the Ouroboros Flywheel.

BashGym - Self-Improving Agentic Dev Gym
Get Started:
$ git clone https://github.com/GhostPeony/bashgym && cd bashgym

10 Core Capabilities

Everything you need to capture traces, train models, and deploy your own coding assistant.

🔍

Trace Capture

Intercept Claude Code tool calls via hooks. Capture prompts, tool outputs, and reasoning traces automatically.

⚖️

Quality Framework

Multi-judge scoring with syntax, semantic, and execution validators. Only high-quality traces become training data.

🔒

Privacy by Design

PII detection, secret scrubbing, and path anonymization. Your code stays private throughout the pipeline.

🎯

Training Strategies

SFT, DPO, and progressive distillation pipelines. Choose the right strategy for your model and data.

📦

Model Registry

Version, tag, and manage trained model artifacts. Track lineage from trace to deployed checkpoint.

🔄

Progressive Routing

Confidence-based routing between your local model and Claude. Your model handles what it knows, Claude handles the rest.

📊

Real-Time Dashboard

Monitor trace collection, training progress, model performance, and routing decisions in a live dashboard.

☁️

Multi-Cloud

Train on Lambda Labs, RunPod, Vast.ai, or your own GPUs. Cloud-agnostic infrastructure provisioning.

📈

Benchmarks

SWE-bench, HumanEval, and custom project-specific benchmarks. Measure real improvement on real tasks.

🛡️

Safety Guardrails

Harmful content filtering, bias detection, and output validation. Safe models from safe data.

The Ouroboros Flywheel

A self-reinforcing loop: use Claude, capture traces, train your model, deploy it, repeat.

ACT

Use Claude Code normally

VERIFY

Judge trace quality

SYNTHESIZE

Build training data

TRAIN

Fine-tune your model

DEPLOY

Route to your model

REPEAT

Continuously improve

Live Training Monitor

Watch your model improve in real time. Track trace collection, training epochs, loss curves, and deployment status from a single dashboard.

  • Trace collection stats and quality scores
  • Training progress with loss and metric curves
  • Model registry with version comparison
  • Routing confidence and fallback rates
  • Benchmark results across model versions
BashGym Training
$ bashgym train --strategy sft
Loading 2,847 verified traces...
Model: codellama-7b-instruct
Epoch 1/3 ████████░░ 80% loss=0.42
Epoch 2/3 ██████████ 100% loss=0.31
Epoch 3/3 ██████████ 100% loss=0.24
Training complete. Checkpoint saved.
HumanEval: 48.2% (+12.1% vs base)

Three Steps to Your Own Model

1

Install Hooks

Install BashGym hooks into Claude Code. Traces are captured automatically as you work.

2

Use Claude Code Normally

Keep coding as usual. BashGym silently captures, scores, and curates high-quality training data.

3

Train Your Model

Launch training with one command. BashGym handles data prep, fine-tuning, evaluation, and deployment.

8-Layer Architecture

A modular system from trace capture to API serving.

Arena
Trace Capture Hook into Claude Code tool calls
Session Recording Full conversation context
Judge
Quality Scoring Multi-judge validation
PII Scrubbing Privacy-first filtering
Factory
Data Synthesis Trace to training format
Augmentation Expand dataset diversity
Gym
SFT / DPO Fine-tuning pipelines
Cloud Provisioning Multi-cloud GPU training
Models
Registry Version and tag checkpoints
Lineage Trace-to-model provenance
Observability
Dashboard Live training monitor
Benchmarks SWE-bench, HumanEval
Integrations
Claude Code Hook-based capture
BashBros Security middleware
API
Serving OpenAI-compatible endpoint
Routing Confidence-based fallback

Works With Your Stack

Claude Code
Ollama
HuggingFace
NVIDIA NIM
BashBros
Docker

Start Training Your Own Model

Capture traces today, train your model tomorrow. The flywheel starts with one command.

View on GitHub