Skip to content

Document Querying

AgentZero can index a directory of text files — code, markdown, config, prose — and answer questions about them using semantic search. During chat, the LLM can call a query tool to retrieve the most relevant chunks from the index.

This works by:

  1. Chunking files into overlapping pieces using sentence/paragraph boundaries
  2. Embedding each chunk via Ollama’s /api/embed endpoint
  3. Storing the vectors to disk (.agentzero/index/)
  4. Querying with cosine similarity at chat time

Pull an embedding model in Ollama:

Terminal window
ollama pull nomic-embed-text

Other supported models: mxbai-embed-large, snowflake-arctic-embed, all-minilm.

Terminal window
az index build

This walks the current directory, skipping .agentzero/, .git/, target/, node_modules/, and other build artifacts. It indexes all text-based files: source code, markdown, TOML, YAML, JSON, and more.

Terminal window
# Index a specific directory
az index build --path /path/to/documents
# Use a different embedding model
az index build --model mxbai-embed-large
# Custom Ollama server
az index build --url http://gpu-box:11434
# Adjust chunk size (default: 1000 characters)
az index build --chunk-size 500
Terminal window
az index status
Index status:
Model: nomic-embed-text
Files: 47
Chunks: 312
Created at: 1715184000

Once the index is built, the query tool is automatically available during chat:

Terminal window
az chat
you> what error handling patterns does this project use?
[tool: query] ok (2048 bytes)
agentzero> Based on the indexed documents, the project uses thiserror
for error types with a consistent pattern of...

The LLM decides when to use query vs read or search based on the question. query is best for semantic/conceptual questions, while search is better for exact string matches.

Terminal window
az index clear
File TypeExtensions
Prose.txt, .md, .rst, .org, .adoc
Code.rs, .py, .js, .ts, .go, .java, .c, .cpp, .rb, .php, .swift, .kt, .scala, .lua, .sh, .zig, .hs, .ex, .clj, .nim, .v
Config.toml, .yaml, .yml, .json, .xml, .csv, .ini, .cfg
Data/Schema.sql, .graphql, .proto
BuildDockerfile, Makefile, Justfile
Web.html, .css, .scss, .less

Binary files, images, and PDFs are skipped in Phase 1. PDF/HTML/DOCX parsing is planned for a future release.

The index lives at .agentzero/index/:

FileFormatPurpose
default.idxbincodeSerialized chunk embeddings
metadata.jsonJSONHuman-readable stats

The index is local to your project and not committed to git (add .agentzero/ to .gitignore).

  1. Chunkingtext-splitter splits files at semantic boundaries (sentences, paragraphs) with a configurable max size
  2. Embedding — Chunks are sent in batches of 32 to Ollama’s /api/embed endpoint
  3. Storage — Embedded chunks are serialized to disk with bincode
  4. Query — The question is embedded with the same model, then ranked against all chunks by cosine similarity. Top 5 results are returned.

All of this runs locally through your existing Ollama instance — no external API calls, no data leaves your machine.