Skip to content

Training Pipeline

DBSprout can fine-tune a small local model on a sample of your real data, then generate from that adapter with the spec engine — without sending data to a cloud provider.

Terminal window
pip install "dbsprout[llm]"
# plus a training backend: Unsloth (CUDA) or MLX (Apple Silicon)

The top-level command runs all three stages in sequence:

Terminal window
dbsprout train --db postgresql://localhost/myapp --sample-rows 1000 --epochs 3 --output .dbsprout
OptionDescriptionDefault
--dbLive database URL to sample (env: DBSPROUT_TARGET_DB)
--sample-rowsRows to sample (≥ 1)1000
--epochsTraining epochs
--output, -oBase directory for artifacts.dbsprout
--seedSampling seed0
--no-pii-redactionDisable PII redaction before serializationfalse
--quietSuppress progress outputfalse

Run a single stage when you need finer control:

Terminal window
# 1. Stratified sample from a live DB into Parquet
dbsprout train extract --db postgresql://localhost/myapp --sample-rows 1000
# 2. Serialize Parquet samples into GReaT-style JSONL
dbsprout train serialize
# 3. Fine-tune a QLoRA adapter on the serialized corpus
dbsprout train run --epochs 3
  • CUDA path uses Unsloth; Apple Silicon uses MLX (auto-detected).
  • Output includes a merged GGUF (Q4_K_M) adapter usable by the spec engine.

PII values are redacted before serialization by default. Pass --no-pii-redaction only for non-sensitive data. Differential-privacy SGD (Opacus) is available on the CUDA backend; the pipeline summary reports the achieved (epsilon, delta).

Terminal window
dbsprout generate --engine spec --lora ./.dbsprout/adapter.gguf

Adapters hot-swap without restarting — see Data Generation.