Data Generation
Generation Engines
Section titled “Generation Engines”DBSprout ships with multiple generation engines for different use cases:
Heuristic Engine (Default)
Section titled “Heuristic Engine (Default)”Uses regex and fuzzy matching on column names and types to select appropriate data generators. No AI model required.
dbsprout generate --engine heuristic- Speed: 100K+ rows/sec
- Quality: ~80% semantic accuracy
- Dependencies: None (core install)
Spec Engine
Section titled “Spec Engine”Uses an AI model to analyze your schema once and produce a DataSpec — a per-column generation plan that is cached and reused.
dbsprout generate --engine spec- Speed: First run includes spec generation (<30s embedded, <5s cloud), subsequent runs use cache
- Quality: High semantic accuracy
- Dependencies:
dbsprout[llm]for embedded,dbsprout[cloud]for cloud providers
Vectorized Engine
Section titled “Vectorized Engine”Uses NumPy for bulk numeric generation. Best for tables with many numeric columns.
dbsprout generate --engine vectorizedFK Dependency Resolution
Section titled “FK Dependency Resolution”DBSprout automatically handles foreign key relationships:
- Builds a directed dependency graph from FK constraints
- Separates self-referencing FKs for special handling
- Performs topological sort to determine insertion order
- If cycles exist: detects SCCs via Tarjan’s algorithm, finds nullable FKs, defers them
- Two-pass insertion: first pass with NULLs for deferred FKs, second pass updates with real values
This ensures 100% FK integrity on every run.
Row Count Configuration
Section titled “Row Count Configuration”# Global row countdbsprout generate --rows 5000
# Per-table via config file (dbsprout.toml)[generation]default_rows = 1000
[generation.tables.users]rows = 500
[generation.tables.orders]rows = 10000Deterministic Generation
Section titled “Deterministic Generation”# Same seed = identical outputdbsprout generate --seed 42Every cell value is derived from a hash-based per-cell seed, making output fully reproducible across runs and machines.