Model Configuration
Overview
Section titled “Overview”AgentZero loads provider configuration from .agentzero/models.json. Each entry defines a provider with its type, URL, default model, and routing metadata. The ProviderRouter instantiates providers dynamically at startup.
Schema
Section titled “Schema”{ "providers": [ { "name": "string", "type": "ollama | openai-compatible | anthropic", "url": "string", "default_model": "string", "is_local": true, "api_key": "string (optional)" } ]}| Field | Type | Description |
|---|---|---|
name | string | Human-readable identifier for this provider |
type | string | ollama, openai-compatible, or anthropic |
url | string | Server URL (including port) |
default_model | string | Model name to use when none is specified |
is_local | bool | Whether the provider runs on the local machine (default: true) |
api_key | string? | API key for authenticated providers. Supports vault references. |
cost_per_million_input_tokens | float | Cost per million input tokens in USD. Default: 0.0 (local). |
cost_per_million_output_tokens | float | Cost per million output tokens in USD. Default: 0.0 (local). |
Cost fields enable the token usage tracking system. Local providers default to $0.00. Set pricing for remote providers to see cost estimates in az usage reports.
Provider Types
Section titled “Provider Types”ollama
Section titled “ollama”Uses the native Ollama API at /api/chat. Supports tool calling with Ollama’s native format.
{ "name": "ollama-local", "type": "ollama", "url": "http://localhost:11434", "default_model": "llama3.2", "is_local": true}openai-compatible
Section titled “openai-compatible”Uses the OpenAI /v1/chat/completions endpoint. Works with llama.cpp, vLLM, LM Studio, LocalAI, Groq, Together, DeepSeek, and any other server that implements the OpenAI chat completions API.
{ "name": "lm-studio", "type": "openai-compatible", "url": "http://localhost:1234", "default_model": "gemma-4-12b", "is_local": true}anthropic
Section titled “anthropic”Uses the Anthropic Messages API at /v1/messages. Tool calling uses content blocks. System prompts are extracted to the top-level system parameter. Sends x-api-key and anthropic-version headers.
Anthropic is always remote (is_local: false), so PII redaction is applied automatically.
{ "name": "claude", "type": "anthropic", "url": "https://api.anthropic.com", "default_model": "claude-sonnet-4-20250514", "is_local": false, "api_key": "vault://anthropic/api-key"}Examples
Section titled “Examples”LM Studio with Gemma 4
Section titled “LM Studio with Gemma 4”{ "providers": [ { "name": "lm-studio-gemma", "type": "openai-compatible", "url": "http://localhost:1234", "default_model": "gemma-4-12b", "is_local": true } ],}az chat# Uses gemma-4-12b via LM Studio automaticallyoMLX (Apple Silicon Multi-Model)
Section titled “oMLX (Apple Silicon Multi-Model)”{ "providers": [ { "name": "omlx", "type": "openai-compatible", "url": "http://localhost:5100", "default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit", "is_local": true } ]}See oMLX Setup for multi-model configuration.
MLX (mlx-lm Server)
Section titled “MLX (mlx-lm Server)”{ "providers": [ { "name": "mlx-local", "type": "openai-compatible", "url": "http://localhost:8080", "default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit", "is_local": true } ]}See MLX Setup for details.
Remote Provider with API Key
Section titled “Remote Provider with API Key”{ "providers": [ { "name": "ollama-local", "type": "ollama", "url": "http://localhost:11434", "default_model": "llama3.2", "is_local": true }, { "name": "gpu-cluster", "type": "openai-compatible", "url": "https://gpu-box.internal:8000", "default_model": "meta-llama/Llama-3.2-70B", "is_local": false, "api_key": "vault://vllm/api-key" } ],}Multiple Local Providers
Section titled “Multiple Local Providers”{ "providers": [ { "name": "ollama", "type": "ollama", "url": "http://localhost:11434", "default_model": "llama3.2", "is_local": true }, { "name": "llama-cpp", "type": "openai-compatible", "url": "http://localhost:8080", "default_model": "codellama", "is_local": true }, { "name": "lm-studio", "type": "openai-compatible", "url": "http://localhost:1234", "default_model": "gemma-4-12b", "is_local": true } ],}is_local and Data Classification
Section titled “is_local and Data Classification”The is_local field controls how the provider interacts with data classification routing:
is_local | Effect |
|---|---|
true | All data classifications allowed. No redaction applied. |
false | Subject to classification-based routing. Private/PII data is redacted before sending. Secret and Credential data is denied. |
Remote providers like Anthropic Claude should always set is_local: false. This ensures PII redaction is applied before any data leaves your machine.
See Provider Routing for the full classification matrix.
API Keys
Section titled “API Keys”The api_key field accepts:
- Plain string — used directly (not recommended for shared configs)
- Vault reference —
vault://provider-name/secret-nameresolves from the encrypted vault at runtime
# Store the key in the vault firstaz vault add vllm api-key
# Then reference it in models.json"api_key": "vault://vllm/api-key"Dynamic Loading
Section titled “Dynamic Loading”The router instantiates providers from config at startup:
let config = ModelsConfig::load(".agentzero/models.json")?;let router = ProviderRouter::from_config(&config)?;The router tries providers in priority order (local first), with automatic fallback and retry on transient failures. Switch the active provider at runtime via az serve with the switch_model ACP method, or via CLI flags:
az chat --provider lm-studio --model gemma-4-12b