● Guides

Data Generation

Generation Engines

Select an engine with --engine (default: heuristic).

Heuristic (Default)

Regex and fuzzy matching on column names and types selects appropriate data generators. No AI model required.

dbsprout generate --engine heuristic
  • Speed: 100K+ rows/sec
  • Quality: ~80% semantic accuracy
  • Dependencies: none (core install)

Spec

An AI model analyzes your schema once and produces a DataSpec — a per-column generation plan that is cached and reused. All row generation stays deterministic.

dbsprout generate --engine spec
  • Speed: first run includes spec generation (<30s embedded, <5s cloud); later runs use the cache
  • Quality: high semantic accuracy
  • Dependencies: dbsprout[llm] (embedded) or dbsprout[cloud] (cloud providers)

Statistical

Fits a GaussianCopula to reference data so synthetic rows preserve the distributions and correlations of a real sample.

dbsprout generate --engine statistical --reference-data ./sample.csv

--reference-data accepts a single CSV or a directory of per-table CSVs.

Finetuned

Generate from a QLoRA adapter produced by the training pipeline.

dbsprout generate --engine spec --lora ./.dbsprout/adapter.gguf

FK Dependency Resolution

DBSprout automatically handles foreign key relationships:

  1. Builds a directed dependency graph from FK constraints
  2. Separates self-referencing FKs for special handling
  3. Topologically sorts to determine insertion order
  4. On cycles: detects SCCs via Tarjan’s algorithm, finds nullable FKs, defers them
  5. Two-pass insertion — first pass NULLs deferred FKs, second pass updates with real values

FK columns sample from real parent primary keys, guaranteeing 100% FK integrity on every run.

Incremental Updates

After a schema migration, apply only the diff-driven changes to existing seed data instead of regenerating everything:

# Compare against the latest stored snapshot
dbsprout generate --incremental

# Diff against a specific schema file or snapshot
dbsprout generate --incremental --file schema_v2.sql
dbsprout generate --incremental --snapshot a1b2c3d4

See Migrations & Incremental Seeding for the full workflow.

Row Count Configuration

# Global row count
dbsprout generate --rows 5000

Per-table overrides go in dbsprout.toml:

[generation]
default_rows = 1000

[tables.users]
rows = 500

[tables.audit_logs]
exclude = true

Deterministic Generation

dbsprout generate --seed 42

Every cell value is derived from a hash-based per-cell seed (default 42), making output fully reproducible across runs and machines.