Chat with Local Models
Basic Usage
Section titled “Basic Usage”az chatModel Selection
Section titled “Model Selection”# Ollama (default)az chat --model llama3.2
# llama.cppaz chat --provider llama-cpp --model codellama
# vLLMaz chat --provider vllm --model meta-llama/Llama-3.2-3B
# LM Studioaz chat --provider lm-studioStreaming
Section titled “Streaming”az chat --streamTokens appear as they’re generated instead of waiting for the full response.
Tool Calling
Section titled “Tool Calling”The model can request tools during conversation. Available tools:
| Tool | Description | Policy |
|---|---|---|
read | Read file contents | Allowed for private data |
list | List directory contents | Allowed |
search | Search file contents | Allowed |
write | Write to a file | Requires approval |
edit | Edit file contents (search-and-replace) | Requires approval |
shell | Execute shell command | Requires approval |
generate_tool | Generate a new WASM tool | Requires approval |
Dangerous tools prompt for approval. Respond with y (approve once), yes-all or a (approve for this session), or n (deny):
you> compile this project [APPROVE shell: `cargo build`?] (y/yes-all/n) y [tool: shell] ok (156 bytes)
agentzero> Build completed successfully.Slash Commands
Section titled “Slash Commands”During chat:
| Command | Description |
|---|---|
/help | List all available commands |
/quit | Exit the session |
/tools | List available tools |
/session | Show session info |
/tree | Display conversation tree |
/branch <id> | Branch from a prior tree node |
/label <text> | Label the current tree node |
/reload | Reload dynamic tools from registry |
/model <name> | Switch to a different model mid-session |
/models | Show current model info |
/skills | List available skills with trigger keywords |
Steering & Follow-Up
Section titled “Steering & Follow-Up”You can send input while the agent is executing tools:
-
Steering — prefix with
!to interrupt and redirect between tool rounds:you> analyze all the test files[tool: search] ok (1200 bytes)!stop, just look at the auth tests[tool: search] ok (340 bytes)agentzero> Here are the auth-related tests... -
Follow-up — type normally during execution to queue a message for after the agent finishes:
you> refactor the database modulealso check if there are any unused imports
Steering messages are delivered between tool rounds. Follow-up messages are processed sequentially after the current response completes.
Project Instructions
Section titled “Project Instructions”Create .agentzero/agents.md to add project-specific instructions that are appended to the system prompt:
# Project Instructions
- This is a Rust workspace with 14 crates.- Always run clippy before suggesting code is done.- Prefer returning Result over panicking.Instructions are loaded from the directory hierarchy — global instructions from ~/.config/agentzero/agents.md first, then project-local instructions. This coexists with the custom system prompt in .agentzero/prompts/system.md.
az init generates a template agents.md automatically.
Resume Sessions
Section titled “Resume Sessions”# List past sessionsaz history
# Resume a sessionaz chat --resume <session-id>Custom System Prompt
Section titled “Custom System Prompt”Create .agentzero/prompts/system.md:
You are a Rust expert focused on safety and performance.Always suggest using `expect()` with descriptive messages instead of `unwrap()`.Prefer zero-copy operations where possible.The chat will use this instead of the default system prompt.
Single-Shot Mode
Section titled “Single-Shot Mode”Use -P (or --print) to send a single message and exit after the response. Useful for scripting and automation.
# Plain text output (default)az chat -P "what language is this project written in?"
# Pretty JSON outputaz chat -P "list all public functions in src/lib.rs" --mode json
# Compact JSONL for pipingaz chat -P "summarize this crate" --mode jsonl | jq .contentAvailable modes:
| Mode | Description |
|---|---|
text | Plain text (default) |
json | Pretty-printed JSON |
jsonl | Compact single-line JSON |
Context Management
Section titled “Context Management”Long conversations are automatically compacted when they exceed model context limits. Choose a compaction strategy with --compaction:
# Default — fixed-size previews of each roleaz chat --compaction simple
# Preserve code blocks verbatim, summarize proseaz chat --compaction code-aware
# Per-role character budgets (tool output smallest, assistant largest)az chat --compaction role-budgetSee Session History & Resume for details on each strategy.
Mid-Session Model Switching
Section titled “Mid-Session Model Switching”Switch models without restarting the session:
you> /model codellamaSwitched to model: codellama (provider: ollama)
you> now optimize this function for performanceA health check runs before switching. If the provider is unreachable, you’ll see a warning but the switch proceeds.