Training Pipeline
DBSprout can fine-tune a small local model on a sample of your real data, then generate from that adapter with the spec engine — without sending data to a cloud provider.
Install
Section titled “Install”pip install "dbsprout[llm]"# plus a training backend: Unsloth (CUDA) or MLX (Apple Silicon)End-to-End
Section titled “End-to-End”The top-level command runs all three stages in sequence:
dbsprout train --db postgresql://localhost/myapp --sample-rows 1000 --epochs 3 --output .dbsprout| Option | Description | Default |
|---|---|---|
--db | Live database URL to sample (env: DBSPROUT_TARGET_DB) | — |
--sample-rows | Rows to sample (≥ 1) | 1000 |
--epochs | Training epochs | — |
--output, -o | Base directory for artifacts | .dbsprout |
--seed | Sampling seed | 0 |
--no-pii-redaction | Disable PII redaction before serialization | false |
--quiet | Suppress progress output | false |
Stages
Section titled “Stages”Run a single stage when you need finer control:
# 1. Stratified sample from a live DB into Parquetdbsprout train extract --db postgresql://localhost/myapp --sample-rows 1000
# 2. Serialize Parquet samples into GReaT-style JSONLdbsprout train serialize
# 3. Fine-tune a QLoRA adapter on the serialized corpusdbsprout train run --epochs 3- CUDA path uses Unsloth; Apple Silicon uses MLX (auto-detected).
- Output includes a merged GGUF (Q4_K_M) adapter usable by the spec engine.
Privacy
Section titled “Privacy”PII values are redacted before serialization by default. Pass
--no-pii-redaction only for non-sensitive data. Differential-privacy SGD
(Opacus) is available on the CUDA backend; the pipeline summary reports the
achieved (epsilon, delta).
Generating From the Adapter
Section titled “Generating From the Adapter”dbsprout generate --engine spec --lora ./.dbsprout/adapter.ggufAdapters hot-swap without restarting — see Data Generation.