Skip to content

Model Configuration

AgentZero loads provider configuration from .agentzero/models.json. Each entry defines a provider with its type, URL, default model, and routing metadata. The ProviderRouter instantiates providers dynamically at startup.

{
"providers": [
{
"name": "string",
"type": "ollama | openai-compatible | anthropic",
"url": "string",
"default_model": "string",
"is_local": true,
"api_key": "string (optional)"
}
]
}
FieldTypeDescription
namestringHuman-readable identifier for this provider
typestringollama, openai-compatible, or anthropic
urlstringServer URL (including port)
default_modelstringModel name to use when none is specified
is_localboolWhether the provider runs on the local machine (default: true)
api_keystring?API key for authenticated providers. Supports vault references.
cost_per_million_input_tokensfloatCost per million input tokens in USD. Default: 0.0 (local).
cost_per_million_output_tokensfloatCost per million output tokens in USD. Default: 0.0 (local).

Cost fields enable the token usage tracking system. Local providers default to $0.00. Set pricing for remote providers to see cost estimates in az usage reports.

Uses the native Ollama API at /api/chat. Supports tool calling with Ollama’s native format.

{
"name": "ollama-local",
"type": "ollama",
"url": "http://localhost:11434",
"default_model": "llama3.2",
"is_local": true
}

Uses the OpenAI /v1/chat/completions endpoint. Works with llama.cpp, vLLM, LM Studio, LocalAI, Groq, Together, DeepSeek, and any other server that implements the OpenAI chat completions API.

{
"name": "lm-studio",
"type": "openai-compatible",
"url": "http://localhost:1234",
"default_model": "gemma-4-12b",
"is_local": true
}

Uses the Anthropic Messages API at /v1/messages. Tool calling uses content blocks. System prompts are extracted to the top-level system parameter. Sends x-api-key and anthropic-version headers.

Anthropic is always remote (is_local: false), so PII redaction is applied automatically.

{
"name": "claude",
"type": "anthropic",
"url": "https://api.anthropic.com",
"default_model": "claude-sonnet-4-20250514",
"is_local": false,
"api_key": "vault://anthropic/api-key"
}
{
"providers": [
{
"name": "lm-studio-gemma",
"type": "openai-compatible",
"url": "http://localhost:1234",
"default_model": "gemma-4-12b",
"is_local": true
}
],
}
Terminal window
az chat
# Uses gemma-4-12b via LM Studio automatically
{
"providers": [
{
"name": "omlx",
"type": "openai-compatible",
"url": "http://localhost:5100",
"default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit",
"is_local": true
}
]
}

See oMLX Setup for multi-model configuration.

{
"providers": [
{
"name": "mlx-local",
"type": "openai-compatible",
"url": "http://localhost:8080",
"default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit",
"is_local": true
}
]
}

See MLX Setup for details.

{
"providers": [
{
"name": "ollama-local",
"type": "ollama",
"url": "http://localhost:11434",
"default_model": "llama3.2",
"is_local": true
},
{
"name": "gpu-cluster",
"type": "openai-compatible",
"url": "https://gpu-box.internal:8000",
"default_model": "meta-llama/Llama-3.2-70B",
"is_local": false,
"api_key": "vault://vllm/api-key"
}
],
}
{
"providers": [
{
"name": "ollama",
"type": "ollama",
"url": "http://localhost:11434",
"default_model": "llama3.2",
"is_local": true
},
{
"name": "llama-cpp",
"type": "openai-compatible",
"url": "http://localhost:8080",
"default_model": "codellama",
"is_local": true
},
{
"name": "lm-studio",
"type": "openai-compatible",
"url": "http://localhost:1234",
"default_model": "gemma-4-12b",
"is_local": true
}
],
}

The is_local field controls how the provider interacts with data classification routing:

is_localEffect
trueAll data classifications allowed. No redaction applied.
falseSubject to classification-based routing. Private/PII data is redacted before sending. Secret and Credential data is denied.

Remote providers like Anthropic Claude should always set is_local: false. This ensures PII redaction is applied before any data leaves your machine.

See Provider Routing for the full classification matrix.

The api_key field accepts:

  • Plain string — used directly (not recommended for shared configs)
  • Vault referencevault://provider-name/secret-name resolves from the encrypted vault at runtime
Terminal window
# Store the key in the vault first
az vault add vllm api-key
# Then reference it in models.json
"api_key": "vault://vllm/api-key"

The router instantiates providers from config at startup:

let config = ModelsConfig::load(".agentzero/models.json")?;
let router = ProviderRouter::from_config(&config)?;

The router tries providers in priority order (local first), with automatic fallback and retry on transient failures. Switch the active provider at runtime via az serve with the switch_model ACP method, or via CLI flags:

Terminal window
az chat --provider lm-studio --model gemma-4-12b