Data Generation
Generation Engines
Select an engine with --engine (default: heuristic).
Heuristic (Default)
Regex and fuzzy matching on column names and types selects appropriate data generators. No AI model required.
dbsprout generate --engine heuristic
- Speed: 100K+ rows/sec
- Quality: ~80% semantic accuracy
- Dependencies: none (core install)
Spec
An AI model analyzes your schema once and produces a DataSpec — a
per-column generation plan that is cached and reused. All row generation
stays deterministic.
dbsprout generate --engine spec
- Speed: first run includes spec generation (<30s embedded, <5s cloud); later runs use the cache
- Quality: high semantic accuracy
- Dependencies:
dbsprout[llm](embedded) ordbsprout[cloud](cloud providers)
Statistical
Fits a GaussianCopula to reference data so synthetic rows preserve the distributions and correlations of a real sample.
dbsprout generate --engine statistical --reference-data ./sample.csv
--reference-data accepts a single CSV or a directory of per-table CSVs.
Finetuned
Generate from a QLoRA adapter produced by the training pipeline.
dbsprout generate --engine spec --lora ./.dbsprout/adapter.gguf
FK Dependency Resolution
DBSprout automatically handles foreign key relationships:
- Builds a directed dependency graph from FK constraints
- Separates self-referencing FKs for special handling
- Topologically sorts to determine insertion order
- On cycles: detects SCCs via Tarjan’s algorithm, finds nullable FKs, defers them
- Two-pass insertion — first pass NULLs deferred FKs, second pass updates with real values
FK columns sample from real parent primary keys, guaranteeing 100% FK integrity on every run.
Incremental Updates
After a schema migration, apply only the diff-driven changes to existing seed data instead of regenerating everything:
# Compare against the latest stored snapshot
dbsprout generate --incremental
# Diff against a specific schema file or snapshot
dbsprout generate --incremental --file schema_v2.sql
dbsprout generate --incremental --snapshot a1b2c3d4
See Migrations & Incremental Seeding for the full workflow.
Row Count Configuration
# Global row count
dbsprout generate --rows 5000
Per-table overrides go in dbsprout.toml:
[generation]
default_rows = 1000
[tables.users]
rows = 500
[tables.audit_logs]
exclude = true
Deterministic Generation
dbsprout generate --seed 42
Every cell value is derived from a hash-based per-cell seed (default 42),
making output fully reproducible across runs and machines.