Model Providers
Supported Providers
Section titled “Supported Providers”| Provider | API Type | Default Port | Flag |
|---|---|---|---|
| Ollama | Native /api/chat | 11434 | --provider ollama |
| llama.cpp | OpenAI /v1/chat/completions | 8080 | --provider llama-cpp |
| vLLM | OpenAI /v1/chat/completions | 8000 | --provider vllm |
| LM Studio | OpenAI /v1/chat/completions | 1234 | --provider lm-studio |
| Anthropic Claude | Messages /v1/messages | N/A (remote) | --provider anthropic |
| Custom | Any (configurable via driver) | Any | "type": "custom" in models.json |
The OpenAI-compatible provider has been verified against Groq, Together, and DeepSeek in addition to the servers listed above. Any service implementing /v1/chat/completions should work.
Provider Routing
Section titled “Provider Routing”AgentZero tries providers in priority order:
- Local providers always attempted first
- Remote providers only used if classification allows
- Automatic fallback when primary is unavailable
- Retry with backoff on transient failures
Classification-Based Routing
Section titled “Classification-Based Routing”| Classification | Local | Remote |
|---|---|---|
| Public | Allowed | Allowed |
| Internal | Allowed | Policy check |
| Private | Allowed | Requires redaction |
| PII | Allowed | Redacted then allowed |
| Secret | Allowed | Denied |
| Credential | Allowed | Denied |
| Unknown | Allowed | Denied |
Custom Server URL
Section titled “Custom Server URL”az chat --provider llama-cpp --url http://gpu-box:8080models.json Configuration
Section titled “models.json Configuration”Providers are configured in .agentzero/models.json. The ProviderRouter loads this file at startup and instantiates providers dynamically.
{ "providers": [ { "name": "ollama-local", "provider_type": "ollama", "base_url": "http://localhost:11434", "default_model": "llama3.2", "is_local": true }, { "name": "lm-studio", "provider_type": "openai-compatible", "base_url": "http://localhost:1234", "default_model": "gemma-4-12b", "is_local": true }, { "name": "remote-vllm", "provider_type": "openai-compatible", "base_url": "https://gpu-box.internal:8000", "default_model": "meta-llama/Llama-3.2-70B", "is_local": false, "api_key": "vault://vllm/api-key" }, { "name": "claude", "provider_type": "anthropic", "base_url": "https://api.anthropic.com", "default_model": "claude-sonnet-4-20250514", "is_local": false, "api_key": "vault://anthropic/api-key" } ], "default_provider": "ollama-local"}See the Model Configuration guide for the full schema and examples.
Custom Providers
Section titled “Custom Providers”The custom provider type lets you add any compatible endpoint via models.json without code changes. Set the driver field to select which client implementation to use:
| Driver | Protocol | Use For |
|---|---|---|
openai-compatible (default) | /v1/chat/completions | Together AI, Groq, Fireworks, DeepSeek, any OpenAI-compatible API |
ollama | /api/chat | Self-hosted Ollama instances on custom ports |
anthropic | /v1/messages | Anthropic-compatible endpoints |
Example — adding Together AI with no code changes:
{ "providers": [ { "name": "together-ai", "type": "custom", "driver": "openai-compatible", "url": "https://api.together.xyz/v1", "default_model": "meta-llama/Llama-3-70b", "is_local": false, "api_key": "vault:together/key" } ]}If driver is omitted, it defaults to openai-compatible.
ModelProvider Trait
Section titled “ModelProvider Trait”All providers implement the ModelProvider trait:
#[async_trait]pub trait ModelProvider: Send + Sync { async fn chat_with_tools( &self, messages: &[ChatMessage], tools: &[ToolDef], ) -> Result<ChatResponse>;
async fn chat_streaming( &self, messages: &[ChatMessage], tools: &[ToolDef], tx: mpsc::Sender<StreamEvent>, ) -> Result<()>;
async fn health_check(&self) -> Result<bool>; fn model_name(&self) -> &str;}OllamaProvider, OpenAICompatProvider, and AnthropicProvider all implement this trait. For standard OpenAI-compatible endpoints, use "type": "custom" in models.json — no code changes needed. For providers with non-standard APIs, implement the ModelProvider trait in Rust.
Dynamic Loading
Section titled “Dynamic Loading”ProviderRouter::from_config() reads models.json and instantiates the correct provider for each entry:
let config = ModelsConfig::load(".agentzero/models.json")?;let router = ProviderRouter::from_config(&config)?;
// Router tries providers in priority order with fallbacklet response = router.chat_with_tools(&messages, &tools).await?;Settings Defaults
Section titled “Settings Defaults”Configure in .agentzero/settings.toml:
[general]default_provider = "ollama"default_model = "llama3.2"These are used when CLI flags are at their defaults.