Skip to content

Backends

raggity supports three LLM generation backends and two vector store backends.


LLM backends

Controlled by generation.backend in raggity.toml.

Claude (default)

Uses the Claude Agent SDK. No extra dependencies. Requires a Claude subscription or API key.

[generation]
backend = "claude"
model = "claude-opus-4-8"
auth = "auto"

Auth modes:

auth value Behaviour
"auto" (default) Uses ANTHROPIC_API_KEY if set, otherwise falls back to claude login subscription session
"subscription" Always uses claude login session; ANTHROPIC_API_KEY is ignored
"api_key" Requires ANTHROPIC_API_KEY; raises at startup if missing

Set up once with the Claude CLI:

claude login

OpenAI-compatible

Any OpenAI-compatible API endpoint — OpenAI, Azure OpenAI, Together, Groq, etc. Requires the openai extra:

pip install raggity[openai]
[generation]
backend = "openai"
model = "gpt-4o-mini"
base_url = "https://api.openai.com/v1"   # default; any OpenAI-compatible URL works
api_key_env = "OPENAI_API_KEY"           # env var name holding the API key

Set the API key:

export OPENAI_API_KEY=sk-...

The auth field is ignored for this backend.

Ollama (offline)

Runs against a local Ollama server — no API key required. Reuses the openai extra (OpenAI client):

pip install raggity[openai]
ollama pull llama3.1
[generation]
backend = "ollama"
model = "llama3.1"
# base_url defaults to http://localhost:11434/v1 — omit unless Ollama is on a different port

The auth field is ignored for this backend.


Vector store backends

Controlled by index.backend in raggity.toml.

LanceDB (default)

LanceDB is the default and requires no extra install. Data is stored locally in the directory specified by index.path.

[index]
backend = "lancedb"
path = ".raggity/index"   # relative to cwd (default)

Recommended for: single-user local deployments.

Qdrant

Qdrant is recommended for large-scale or multi-user deployments. Install the extra:

pip install raggity[qdrant]

Configure in raggity.toml:

[index]
backend = "qdrant"
qdrant_location = "http://localhost:6333"   # remote Qdrant server
# qdrant_location = ":memory:"             # ephemeral in-process (testing)
# qdrant_location = "/path/to/local"       # persistent local storage
qdrant_collection = "raggity"
# qdrant_api_key = "..."                   # or set QDRANT_API_KEY env var

Start a local Qdrant instance with Docker:

docker run -p 6333:6333 qdrant/qdrant

Recommended for: large corpora, multi-user server deployments.


Embedding models

raggity uses fastembed with ONNX Runtime — CPU by default, no GPU required.

Model Dims Notes
BAAI/bge-small-en-v1.5 384 Default — lightweight, portable
nomic-embed-text-v1.5-Q 768 Higher quality — Matryoshka scaling, 8k context
[embedding]
model = "BAAI/bge-small-en-v1.5"   # or nomic-embed-text-v1.5-Q
provider = "cpu"                    # cpu / cuda / directml / rocm

Warning

Changing embedding.model triggers an automatic full index rebuild via the index fingerprint.

Reranking models

Model Size Notes
Xenova/ms-marco-MiniLM-L-6-v2 ~25 MB Default — fast and portable
BAAI/bge-reranker-v2-m3 ~1 GB Higher quality
[retrieval]
rerank_model = "Xenova/ms-marco-MiniLM-L-6-v2"