Backends¶
raggity supports three LLM generation backends and two vector store backends.
LLM backends¶
Controlled by generation.backend in raggity.toml.
Claude (default)¶
Uses the Claude Agent SDK. No extra dependencies. Requires a Claude subscription or API key.
Auth modes:
auth value |
Behaviour |
|---|---|
"auto" (default) |
Uses ANTHROPIC_API_KEY if set, otherwise falls back to claude login subscription session |
"subscription" |
Always uses claude login session; ANTHROPIC_API_KEY is ignored |
"api_key" |
Requires ANTHROPIC_API_KEY; raises at startup if missing |
Set up once with the Claude CLI:
OpenAI-compatible¶
Any OpenAI-compatible API endpoint — OpenAI, Azure OpenAI, Together, Groq, etc. Requires the openai extra:
[generation]
backend = "openai"
model = "gpt-4o-mini"
base_url = "https://api.openai.com/v1" # default; any OpenAI-compatible URL works
api_key_env = "OPENAI_API_KEY" # env var name holding the API key
Set the API key:
The auth field is ignored for this backend.
Ollama (offline)¶
Runs against a local Ollama server — no API key required. Reuses the openai extra (OpenAI client):
[generation]
backend = "ollama"
model = "llama3.1"
# base_url defaults to http://localhost:11434/v1 — omit unless Ollama is on a different port
The auth field is ignored for this backend.
Vector store backends¶
Controlled by index.backend in raggity.toml.
LanceDB (default)¶
LanceDB is the default and requires no extra install. Data is stored locally in the directory specified by index.path.
Recommended for: single-user local deployments.
Qdrant¶
Qdrant is recommended for large-scale or multi-user deployments. Install the extra:
Configure in raggity.toml:
[index]
backend = "qdrant"
qdrant_location = "http://localhost:6333" # remote Qdrant server
# qdrant_location = ":memory:" # ephemeral in-process (testing)
# qdrant_location = "/path/to/local" # persistent local storage
qdrant_collection = "raggity"
# qdrant_api_key = "..." # or set QDRANT_API_KEY env var
Start a local Qdrant instance with Docker:
Recommended for: large corpora, multi-user server deployments.
Embedding models¶
raggity uses fastembed with ONNX Runtime — CPU by default, no GPU required.
| Model | Dims | Notes |
|---|---|---|
BAAI/bge-small-en-v1.5 |
384 | Default — lightweight, portable |
nomic-embed-text-v1.5-Q |
768 | Higher quality — Matryoshka scaling, 8k context |
[embedding]
model = "BAAI/bge-small-en-v1.5" # or nomic-embed-text-v1.5-Q
provider = "cpu" # cpu / cuda / directml / rocm
Warning
Changing embedding.model triggers an automatic full index rebuild via the index fingerprint.
Reranking models¶
| Model | Size | Notes |
|---|---|---|
Xenova/ms-marco-MiniLM-L-6-v2 |
~25 MB | Default — fast and portable |
BAAI/bge-reranker-v2-m3 |
~1 GB | Higher quality |