Training Pipeline

DBSprout can fine-tune a small local model on a sample of your real data, then generate from that adapter with the spec engine — without sending data to a cloud provider.

Install

pip install "dbsprout[llm]"
# plus a training backend: Unsloth (CUDA) or MLX (Apple Silicon)

End-to-End

The top-level command runs all three stages in sequence:

dbsprout train --db postgresql://localhost/myapp --sample-rows 1000 --epochs 3 --output .dbsprout

Option	Description	Default
`--db`	Live database URL to sample (env: `DBSPROUT_TARGET_DB`)	—
`--sample-rows`	Rows to sample (≥ 1)	`1000`
`--epochs`	Training epochs	—
`--output`, `-o`	Base directory for artifacts	`.dbsprout`
`--seed`	Sampling seed	`0`
`--no-pii-redaction`	Disable PII redaction before serialization	`false`
`--quiet`	Suppress progress output	`false`

Stages

Run a single stage when you need finer control:

# 1. Stratified sample from a live DB into Parquet
dbsprout train extract --db postgresql://localhost/myapp --sample-rows 1000

# 2. Serialize Parquet samples into GReaT-style JSONL
dbsprout train serialize

# 3. Fine-tune a QLoRA adapter on the serialized corpus
dbsprout train run --epochs 3

CUDA path uses Unsloth; Apple Silicon uses MLX (auto-detected).
Output includes a merged GGUF (Q4_K_M) adapter usable by the spec engine.

Privacy

PII values are redacted before serialization by default. Pass --no-pii-redaction only for non-sensitive data. Differential-privacy SGD (Opacus) is available on the CUDA backend; the pipeline summary reports the achieved (epsilon, delta).

Generating From the Adapter

dbsprout generate --engine spec --lora ./.dbsprout/adapter.gguf

Adapters hot-swap without restarting — see Data Generation.