Skip to content

Match production distributions

Problem: heuristic data is structurally valid but statistically unlike production — wrong value frequencies, no column correlations.

Solution:

Terminal window
dbsprout generate --engine statistical --reference-data ./sample.csv

--reference-data accepts a single CSV or a directory of per-table CSVs. Install the engine’s dependencies if prompted.

Why it works: the statistical engine fits a GaussianCopula to the reference sample, so generated columns preserve marginal distributions and inter-column correlations while staying schema- and FK-valid. Verify fidelity with dbsprout validate --reference-data ./sample.csv --detection.