LLM Configuration

Embedded Model (Default)

DBSprout ships with support for an embedded Qwen2.5-1.5B model that runs entirely on your machine.

# Install LLM support
pip install "dbsprout[llm]"

# Download the model
dbsprout models download

# Generate with spec engine (uses embedded model by default)
dbsprout generate --engine spec

The embedded model uses llama-cpp-python with GBNF grammar constraints to guarantee valid JSON output. Memory usage is under 2GB.

Cloud Providers

For faster or higher-quality spec generation, use a cloud LLM provider:

# Install cloud support
pip install "dbsprout[cloud]"

# Use OpenAI
export OPENAI_API_KEY="sk-..."
dbsprout generate --engine spec --llm-provider openai

# Use Anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
dbsprout generate --engine spec --llm-provider anthropic

DBSprout uses LiteLLM under the hood, so any provider supported by LiteLLM works.

Configuration

LLM settings can be configured in dbsprout.toml:

[llm]
provider = "embedded"  # or "openai", "anthropic", "ollama", etc.
model = "qwen2.5-1.5b"
temperature = 0.1
max_tokens = 4096

Privacy Tiers

LLM usage follows DBSprout’s privacy gradient:

Tier	What’s sent	Use case
Local	Nothing leaves your machine	Default, uses embedded model
Redacted	Schema structure only, no data	Column names/types sent to cloud
Cloud	Schema + sample data	Best accuracy, requires API key
Training	Full data access	Fine-tuning only

[privacy]
tier = "local"  # default

Spec Caching

The AI-generated spec is cached based on a hash of your schema. If your schema hasn’t changed, the cached spec is reused automatically. Cache location: .dbsprout/cache/.

# Clear the spec cache
dbsprout generate --engine spec --no-cache