Skip to content

Retrieval & Memory

AgentZero’s retrieval layer is built around three independent indexes that share a single source of truth. Each index can be enabled independently and gracefully falls back to a brute-force scan when missing or rebuilding.

DataSource of truthIndexUsed for
Conversation memorySQLite (memory table) — encrypted via SQLCipherHNSW (optional)Semantic recall in agent loop
RAG documentsEncrypted JSON store ({data_dir}/rag/index.jsonl)Tantivy (when rag feature enabled)BM25 keyword search via agentzero rag query

The source-of-truth stores never go away. Indexes are sidecar artifacts that can be deleted and rebuilt without losing data.

Every agent interaction is appended to the memory table in the encrypted SQLite database. Rows carry the standard fields you’d expect — role, content, created_at, conversation_id, org_id, agent_id, expires_at — plus an optional embedding BLOB.

When you call MemoryStore::append_with_embedding(entry, vec), the embedding is stored alongside the row.

The default MemoryStore::semantic_recall() implementation does a full table scan: load every row that has an embedding, compute cosine similarity in process, sort, take the top limit. That’s O(n) and fine for hundreds or low thousands of rows.

For larger memory stores, opt into the HNSW approximate nearest neighbor index:

use agentzero_storage::SqliteMemoryStore;
let mut store = SqliteMemoryStore::open("memory.db", Some(&key))?;
store.enable_hnsw_index("/var/lib/agentzero/hnsw", 384)?;

After this call:

  • append_with_embedding() writes to both SQLite (durable) and HNSW (fast lookup)
  • The HNSW index checkpoints to disk every 100 inserts
  • semantic_recall(query, k) queries HNSW for (k * 3) candidate IDs, then resolves the full MemoryEntry rows from SQLite preserving HNSW ranking
  • Expired rows (expires_at < now) are filtered after the HNSW lookup

If the HNSW directory is missing on startup (cold start, fresh deployment, accidental delete), enable_hnsw_index() rebuilds it by scanning every embedded row from SQLite. The index is never authoritative — it’s always derivable from the source of truth.

The hnsw_rs dep is always linked into agentzero-storage regardless of feature flags, so the API surface is uniform across all builds.

For combined keyword + semantic queries, use MemoryStore::hybrid_recall(query_text, query_embedding, limit). The default implementation:

  1. Runs semantic_recall() over (limit * 4) candidates to get the semantic ranking
  2. Runs a substring match over the same recent() window to get a keyword ranking
  3. Fuses both rankings via reciprocal rank fusion with k = 60
  4. Deduplicates on a content fingerprint and returns the top limit

The SemanticRecallTool supports this directly:

{ "query": "what did we decide about the database?", "limit": 5, "mode": "hybrid" }

mode defaults to "semantic" for backward compatibility; pass "hybrid" explicitly to opt into the fused ranking.

The RAG index is a separate store for explicitly-ingested documents (notes, knowledge base articles, code snippets) that you want the agent to retrieve at query time. It is independent from conversation memory.

Documents are persisted twice:

  1. Encrypted JSON store at {data_dir}/rag/index.jsonl — durable source of truth, AES-256-GCM encrypted
  2. Tantivy inverted index in a sibling {data_dir}/rag/index.jsonl.tantivy/ directory — fast BM25 query path

When you call agentzero rag ingest --id <id> --text "...", the document is written to both. When you call agentzero rag query "search terms", the Tantivy index is consulted; results are returned with a score: f32 BM25 relevance value.

If the Tantivy directory is missing or corrupt (newer install, accidental delete, schema drift across versions), the next query rebuilds it from the encrypted store. Migration from the legacy plaintext JSONL format also happens transparently.

The Tantivy dep is gated behind the rag feature flag; builds without rag skip both Tantivy and the multimodal/document chunking surface.

Both semantic recall and hybrid retrieval need an embedding model. AgentZero ships two implementations:

ProviderWhen to use
CandleEmbeddingProviderLocal, in-process. Uses sentence-transformers/all-MiniLM-L6-v2 (384 dims, ~23 MB). Downloads from HuggingFace on first use, then runs entirely offline.
ApiEmbeddingProviderCalls an OpenAI-compatible /v1/embeddings endpoint. Use when you want a hosted model or already have credentials configured.

Both implement the same EmbeddingProvider trait. The SemanticRecallTool accepts whichever you wire it with.

We deliberately keep the source-of-truth stores separate from the indexes, even though it means writing the same data twice on ingest:

  • Recovery is simple. If an index is corrupt, delete the directory and the next call rebuilds it from SQLite or the encrypted JSON store. There is no “split brain” recovery path.
  • Indexes are optional. A small deployment can ignore HNSW and Tantivy entirely; the brute-force fallbacks still work and the API surface is identical.
  • Encryption stays at the storage layer. The Tantivy index lives on disk in plaintext (Tantivy doesn’t encrypt natively), but the content it indexes is always reachable from the encrypted store. Compromising the Tantivy directory leaks the keyword index, not the conversation history.