Skip to content

Chat with Local Models

Terminal window
az chat
Terminal window
# Ollama (default)
az chat --model llama3.2
# llama.cpp
az chat --provider llama-cpp --model codellama
# vLLM
az chat --provider vllm --model meta-llama/Llama-3.2-3B
# LM Studio
az chat --provider lm-studio
Terminal window
az chat --stream

Tokens appear as they’re generated instead of waiting for the full response.

The model can request tools during conversation. Available tools:

ToolDescriptionPolicy
readRead file contentsAllowed for private data
listList directory contentsAllowed
searchSearch file contentsAllowed
writeWrite to a fileRequires approval
editEdit file contents (search-and-replace)Requires approval
shellExecute shell commandRequires approval
generate_toolGenerate a new WASM toolRequires approval

Dangerous tools prompt for approval. Respond with y (approve once), yes-all or a (approve for this session), or n (deny):

you> compile this project
[APPROVE shell: `cargo build`?] (y/yes-all/n) y
[tool: shell] ok (156 bytes)
agentzero> Build completed successfully.

During chat:

CommandDescription
/helpList all available commands
/quitExit the session
/toolsList available tools
/sessionShow session info
/treeDisplay conversation tree
/branch <id>Branch from a prior tree node
/label <text>Label the current tree node
/reloadReload dynamic tools from registry
/model <name>Switch to a different model mid-session
/modelsShow current model info
/skillsList available skills with trigger keywords

You can send input while the agent is executing tools:

  • Steering — prefix with ! to interrupt and redirect between tool rounds:

    you> analyze all the test files
    [tool: search] ok (1200 bytes)
    !stop, just look at the auth tests
    [tool: search] ok (340 bytes)
    agentzero> Here are the auth-related tests...
  • Follow-up — type normally during execution to queue a message for after the agent finishes:

    you> refactor the database module
    also check if there are any unused imports

Steering messages are delivered between tool rounds. Follow-up messages are processed sequentially after the current response completes.

Create .agentzero/agents.md to add project-specific instructions that are appended to the system prompt:

# Project Instructions
- This is a Rust workspace with 14 crates.
- Always run clippy before suggesting code is done.
- Prefer returning Result over panicking.

Instructions are loaded from the directory hierarchy — global instructions from ~/.config/agentzero/agents.md first, then project-local instructions. This coexists with the custom system prompt in .agentzero/prompts/system.md.

az init generates a template agents.md automatically.

Terminal window
# List past sessions
az history
# Resume a session
az chat --resume <session-id>

Create .agentzero/prompts/system.md:

You are a Rust expert focused on safety and performance.
Always suggest using `expect()` with descriptive messages instead of `unwrap()`.
Prefer zero-copy operations where possible.

The chat will use this instead of the default system prompt.

Use -P (or --print) to send a single message and exit after the response. Useful for scripting and automation.

Terminal window
# Plain text output (default)
az chat -P "what language is this project written in?"
# Pretty JSON output
az chat -P "list all public functions in src/lib.rs" --mode json
# Compact JSONL for piping
az chat -P "summarize this crate" --mode jsonl | jq .content

Available modes:

ModeDescription
textPlain text (default)
jsonPretty-printed JSON
jsonlCompact single-line JSON

Long conversations are automatically compacted when they exceed model context limits. Choose a compaction strategy with --compaction:

Terminal window
# Default — fixed-size previews of each role
az chat --compaction simple
# Preserve code blocks verbatim, summarize prose
az chat --compaction code-aware
# Per-role character budgets (tool output smallest, assistant largest)
az chat --compaction role-budget

See Session History & Resume for details on each strategy.

Switch models without restarting the session:

you> /model codellama
Switched to model: codellama (provider: ollama)
you> now optimize this function for performance

A health check runs before switching. If the provider is unreachable, you’ll see a warning but the switch proceeds.