Model Configuration

Overview

AgentZero loads provider configuration from .agentzero/models.json. Each entry defines a provider with its type, URL, default model, and routing metadata. The ProviderRouter instantiates providers dynamically at startup.

Schema

{
  "providers": [
    {
      "name": "string",
      "type": "ollama | openai-compatible | anthropic",
      "url": "string",
      "default_model": "string",
      "is_local": true,
      "api_key": "string (optional)"
    }
  ]
}

Field	Type	Description
`name`	string	Human-readable identifier for this provider
`type`	string	`ollama`, `openai-compatible`, or `anthropic`
`url`	string	Server URL (including port)
`default_model`	string	Model name to use when none is specified
`is_local`	bool	Whether the provider runs on the local machine (default: true)
`api_key`	string?	API key for authenticated providers. Supports vault references.
`cost_per_million_input_tokens`	float	Cost per million input tokens in USD. Default: 0.0 (local).
`cost_per_million_output_tokens`	float	Cost per million output tokens in USD. Default: 0.0 (local).

Cost fields enable the token usage tracking system. Local providers default to $0.00. Set pricing for remote providers to see cost estimates in az usage reports.

Provider Types

`ollama`

Uses the native Ollama API at /api/chat. Supports tool calling with Ollama’s native format.

{
  "name": "ollama-local",
  "type": "ollama",
  "url": "http://localhost:11434",
  "default_model": "llama3.2",
  "is_local": true
}

`openai-compatible`

Uses the OpenAI /v1/chat/completions endpoint. Works with llama.cpp, vLLM, LM Studio, LocalAI, Groq, Together, DeepSeek, and any other server that implements the OpenAI chat completions API.

{
  "name": "lm-studio",
  "type": "openai-compatible",
  "url": "http://localhost:1234",
  "default_model": "gemma-4-12b",
  "is_local": true
}

`anthropic`

Uses the Anthropic Messages API at /v1/messages. Tool calling uses content blocks. System prompts are extracted to the top-level system parameter. Sends x-api-key and anthropic-version headers.

Anthropic is always remote (is_local: false), so PII redaction is applied automatically.

{
  "name": "claude",
  "type": "anthropic",
  "url": "https://api.anthropic.com",
  "default_model": "claude-sonnet-4-20250514",
  "is_local": false,
  "api_key": "vault://anthropic/api-key"
}

Examples

LM Studio with Gemma 4

{
  "providers": [
    {
      "name": "lm-studio-gemma",
      "type": "openai-compatible",
      "url": "http://localhost:1234",
      "default_model": "gemma-4-12b",
      "is_local": true
    }
  ],
}

az chat
# Uses gemma-4-12b via LM Studio automatically

oMLX (Apple Silicon Multi-Model)

{
  "providers": [
    {
      "name": "omlx",
      "type": "openai-compatible",
      "url": "http://localhost:5100",
      "default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit",
      "is_local": true
    }
  ]
}

See oMLX Setup for multi-model configuration.

MLX (mlx-lm Server)

{
  "providers": [
    {
      "name": "mlx-local",
      "type": "openai-compatible",
      "url": "http://localhost:8080",
      "default_model": "mlx-community/Qwen2.5-Coder-7B-Instruct-4bit",
      "is_local": true
    }
  ]
}

See MLX Setup for details.

Remote Provider with API Key

{
  "providers": [
    {
      "name": "ollama-local",
      "type": "ollama",
      "url": "http://localhost:11434",
      "default_model": "llama3.2",
      "is_local": true
    },
    {
      "name": "gpu-cluster",
      "type": "openai-compatible",
      "url": "https://gpu-box.internal:8000",
      "default_model": "meta-llama/Llama-3.2-70B",
      "is_local": false,
      "api_key": "vault://vllm/api-key"
    }
  ],
}

Multiple Local Providers

{
  "providers": [
    {
      "name": "ollama",
      "type": "ollama",
      "url": "http://localhost:11434",
      "default_model": "llama3.2",
      "is_local": true
    },
    {
      "name": "llama-cpp",
      "type": "openai-compatible",
      "url": "http://localhost:8080",
      "default_model": "codellama",
      "is_local": true
    },
    {
      "name": "lm-studio",
      "type": "openai-compatible",
      "url": "http://localhost:1234",
      "default_model": "gemma-4-12b",
      "is_local": true
    }
  ],
}

is_local and Data Classification

The is_local field controls how the provider interacts with data classification routing:

`is_local`	Effect
`true`	All data classifications allowed. No redaction applied.
`false`	Subject to classification-based routing. Private/PII data is redacted before sending. Secret and Credential data is denied.

Remote providers like Anthropic Claude should always set is_local: false. This ensures PII redaction is applied before any data leaves your machine.

See Provider Routing for the full classification matrix.

API Keys

The api_key field accepts:

Plain string — used directly (not recommended for shared configs)
Vault reference — vault://provider-name/secret-name resolves from the encrypted vault at runtime

# Store the key in the vault first
az vault add vllm api-key

# Then reference it in models.json
"api_key": "vault://vllm/api-key"

Dynamic Loading

The router instantiates providers from config at startup:

let config = ModelsConfig::load(".agentzero/models.json")?;
let router = ProviderRouter::from_config(&config)?;

The router tries providers in priority order (local first), with automatic fallback and retry on transient failures. Switch the active provider at runtime via az serve with the switch_model ACP method, or via CLI flags:

az chat --provider lm-studio --model gemma-4-12b