Chat with Local Models

Basic Usage

az chat

Model Selection

# Ollama (default)
az chat --model llama3.2

# llama.cpp
az chat --provider llama-cpp --model codellama

# vLLM
az chat --provider vllm --model meta-llama/Llama-3.2-3B

# LM Studio
az chat --provider lm-studio

Streaming

az chat --stream

Tokens appear as they’re generated instead of waiting for the full response.

Tool Calling

The model can request tools during conversation. Available tools:

Tool	Description	Policy
`read`	Read file contents	Allowed for private data
`list`	List directory contents	Allowed
`search`	Search file contents	Allowed
`write`	Write to a file	Requires approval
`edit`	Edit file contents (search-and-replace)	Requires approval
`shell`	Execute shell command	Requires approval
`generate_tool`	Generate a new WASM tool	Requires approval

Dangerous tools prompt for approval. Respond with y (approve once), yes-all or a (approve for this session), or n (deny):

you> compile this project
  [APPROVE shell: `cargo build`?] (y/yes-all/n) y
  [tool: shell] ok (156 bytes)

agentzero> Build completed successfully.

Slash Commands

During chat:

Command	Description
`/help`	List all available commands
`/quit`	Exit the session
`/tools`	List available tools
`/session`	Show session info
`/tree`	Display conversation tree
`/branch <id>`	Branch from a prior tree node
`/label <text>`	Label the current tree node
`/reload`	Reload dynamic tools from registry
`/model <name>`	Switch to a different model mid-session
`/models`	Show current model info
`/skills`	List available skills with trigger keywords

Steering & Follow-Up

You can send input while the agent is executing tools:

Steering — prefix with ! to interrupt and redirect between tool rounds:

you> analyze all the test files
  [tool: search] ok (1200 bytes)
!stop, just look at the auth tests
  [tool: search] ok (340 bytes)

agentzero> Here are the auth-related tests...

Follow-up — type normally during execution to queue a message for after the agent finishes:
```
you> refactor the database module
also check if there are any unused imports
```

Steering messages are delivered between tool rounds. Follow-up messages are processed sequentially after the current response completes.

Project Instructions

Create .agentzero/agents.md to add project-specific instructions that are appended to the system prompt:

# Project Instructions

- This is a Rust workspace with 14 crates.
- Always run clippy before suggesting code is done.
- Prefer returning Result over panicking.

Instructions are loaded from the directory hierarchy — global instructions from ~/.config/agentzero/agents.md first, then project-local instructions. This coexists with the custom system prompt in .agentzero/prompts/system.md.

az init generates a template agents.md automatically.

Resume Sessions

# List past sessions
az history

# Resume a session
az chat --resume <session-id>

Custom System Prompt

Create .agentzero/prompts/system.md:

You are a Rust expert focused on safety and performance.
Always suggest using `expect()` with descriptive messages instead of `unwrap()`.
Prefer zero-copy operations where possible.

The chat will use this instead of the default system prompt.

Single-Shot Mode

Use -P (or --print) to send a single message and exit after the response. Useful for scripting and automation.

# Plain text output (default)
az chat -P "what language is this project written in?"

# Pretty JSON output
az chat -P "list all public functions in src/lib.rs" --mode json

# Compact JSONL for piping
az chat -P "summarize this crate" --mode jsonl | jq .content

Available modes:

Mode	Description
`text`	Plain text (default)
`json`	Pretty-printed JSON
`jsonl`	Compact single-line JSON

Context Management

Long conversations are automatically compacted when they exceed model context limits. Choose a compaction strategy with --compaction:

# Default — fixed-size previews of each role
az chat --compaction simple

# Preserve code blocks verbatim, summarize prose
az chat --compaction code-aware

# Per-role character budgets (tool output smallest, assistant largest)
az chat --compaction role-budget

See Session History & Resume for details on each strategy.

Mid-Session Model Switching

Switch models without restarting the session:

you> /model codellama
Switched to model: codellama (provider: ollama)

you> now optimize this function for performance

A health check runs before switching. If the provider is unreachable, you’ll see a warning but the switch proceeds.