LLM Configuration

The spec engine calls an LLM once per schema to produce a cached DataSpec. All row generation stays deterministic — the LLM never generates row values directly.

Embedded Model (Default)

DBSprout supports an embedded Qwen 2.5-1.5B model that runs entirely on your machine.

# Install embedded LLM support
pip install "dbsprout[llm]"

# List and download a registry model
dbsprout models list
dbsprout models download qwen2.5-1.5b

# Generate with the spec engine (embedded by default)
dbsprout generate --engine spec

The embedded model uses llama-cpp-python with GBNF grammar constraints to guarantee valid JSON output. Memory usage stays under 2 GB.

Cloud Providers

For faster or higher-quality spec generation, allow a cloud provider by raising the privacy tier and supplying provider credentials:

# Install cloud support
pip install "dbsprout[cloud]"

export OPENAI_API_KEY="sk-..."

# --privacy cloud permits sending schema (and, per tier, sample data) to a cloud LLM
dbsprout generate --engine spec --privacy cloud

DBSprout uses LiteLLM + Instructor under the hood, so any LiteLLM-supported provider works. Provider and model are selected in dbsprout.toml (see the Configuration reference).

Privacy Tiers

LLM usage follows DBSprout’s privacy gradient. Set the tier with --privacy or in dbsprout.toml:

Tier	What’s sent	Use case
local	Nothing leaves your machine	Default — embedded model only
redacted	Schema structure only, no data	Column names/types may reach a cloud LLM
cloud	Schema + sample data	Best accuracy; requires a provider API key

dbsprout generate --engine spec --privacy redacted

Spec Caching

The generated spec is cached by a hash of your schema under .dbsprout/. If the schema is unchanged, the cached spec is reused automatically — no LLM call is made.

Ollama

A local Ollama server can serve the spec model without the embedded runtime; configure it as the provider in dbsprout.toml.