Backends¶

raggity supports three LLM generation backends and two vector store backends.

LLM backends¶

Controlled by generation.backend in raggity.toml.

Claude (default)¶

Uses the Claude Agent SDK. No extra dependencies. Requires a Claude subscription or API key.

[generation]
backend = "claude"
model = "claude-opus-4-8"
auth = "auto"

Auth modes:

`auth` value	Behaviour
`"auto"` (default)	Uses `ANTHROPIC_API_KEY` if set, otherwise falls back to `claude login` subscription session
`"subscription"`	Always uses `claude login` session; `ANTHROPIC_API_KEY` is ignored
`"api_key"`	Requires `ANTHROPIC_API_KEY`; raises at startup if missing

Set up once with the Claude CLI:

claude login

OpenAI-compatible¶

Any OpenAI-compatible API endpoint — OpenAI, Azure OpenAI, Together, Groq, etc. Requires the openai extra:

pip install raggity[openai]

[generation]
backend = "openai"
model = "gpt-4o-mini"
base_url = "https://api.openai.com/v1"   # default; any OpenAI-compatible URL works
api_key_env = "OPENAI_API_KEY"           # env var name holding the API key

Set the API key:

export OPENAI_API_KEY=sk-...

The auth field is ignored for this backend.

Ollama (offline)¶

Runs against a local Ollama server — no API key required. Reuses the openai extra (OpenAI client):

pip install raggity[openai]
ollama pull llama3.1

[generation]
backend = "ollama"
model = "llama3.1"
# base_url defaults to http://localhost:11434/v1 — omit unless Ollama is on a different port

The auth field is ignored for this backend.

Vector store backends¶

Controlled by index.backend in raggity.toml.

LanceDB (default)¶

LanceDB is the default and requires no extra install. Data is stored locally in the directory specified by index.path.

[index]
backend = "lancedb"
path = ".raggity/index"   # relative to cwd (default)

Recommended for: single-user local deployments.

Qdrant¶

Qdrant is recommended for large-scale or multi-user deployments. Install the extra:

pip install raggity[qdrant]

Configure in raggity.toml:

[index]
backend = "qdrant"
qdrant_location = "http://localhost:6333"   # remote Qdrant server
# qdrant_location = ":memory:"             # ephemeral in-process (testing)
# qdrant_location = "/path/to/local"       # persistent local storage
qdrant_collection = "raggity"
# qdrant_api_key = "..."                   # or set QDRANT_API_KEY env var

Start a local Qdrant instance with Docker:

docker run -p 6333:6333 qdrant/qdrant

Recommended for: large corpora, multi-user server deployments.

Embedding models¶

raggity uses fastembed with ONNX Runtime — CPU by default, no GPU required.

Model	Dims	Notes
`BAAI/bge-small-en-v1.5`	384	Default — lightweight, portable
`nomic-embed-text-v1.5-Q`	768	Higher quality — Matryoshka scaling, 8k context

[embedding]
model = "BAAI/bge-small-en-v1.5"   # or nomic-embed-text-v1.5-Q
provider = "cpu"                    # cpu / cuda / directml / rocm

Warning

Changing embedding.model triggers an automatic full index rebuild via the index fingerprint.

Reranking models¶

Model	Size	Notes
`Xenova/ms-marco-MiniLM-L-6-v2`	~25 MB	Default — fast and portable
`BAAI/bge-reranker-v2-m3`	~1 GB	Higher quality

[retrieval]
rerank_model = "Xenova/ms-marco-MiniLM-L-6-v2"