Skip to content

Model Providers

ProviderAPI TypeDefault PortFlag
OllamaNative /api/chat11434--provider ollama
llama.cppOpenAI /v1/chat/completions8080--provider llama-cpp
vLLMOpenAI /v1/chat/completions8000--provider vllm
LM StudioOpenAI /v1/chat/completions1234--provider lm-studio
Anthropic ClaudeMessages /v1/messagesN/A (remote)--provider anthropic
CustomAny (configurable via driver)Any"type": "custom" in models.json

The OpenAI-compatible provider has been verified against Groq, Together, and DeepSeek in addition to the servers listed above. Any service implementing /v1/chat/completions should work.

AgentZero tries providers in priority order:

  1. Local providers always attempted first
  2. Remote providers only used if classification allows
  3. Automatic fallback when primary is unavailable
  4. Retry with backoff on transient failures
ClassificationLocalRemote
PublicAllowedAllowed
InternalAllowedPolicy check
PrivateAllowedRequires redaction
PIIAllowedRedacted then allowed
SecretAllowedDenied
CredentialAllowedDenied
UnknownAllowedDenied
Terminal window
az chat --provider llama-cpp --url http://gpu-box:8080

Providers are configured in .agentzero/models.json. The ProviderRouter loads this file at startup and instantiates providers dynamically.

{
"providers": [
{
"name": "ollama-local",
"provider_type": "ollama",
"base_url": "http://localhost:11434",
"default_model": "llama3.2",
"is_local": true
},
{
"name": "lm-studio",
"provider_type": "openai-compatible",
"base_url": "http://localhost:1234",
"default_model": "gemma-4-12b",
"is_local": true
},
{
"name": "remote-vllm",
"provider_type": "openai-compatible",
"base_url": "https://gpu-box.internal:8000",
"default_model": "meta-llama/Llama-3.2-70B",
"is_local": false,
"api_key": "vault://vllm/api-key"
},
{
"name": "claude",
"provider_type": "anthropic",
"base_url": "https://api.anthropic.com",
"default_model": "claude-sonnet-4-20250514",
"is_local": false,
"api_key": "vault://anthropic/api-key"
}
],
"default_provider": "ollama-local"
}

See the Model Configuration guide for the full schema and examples.

The custom provider type lets you add any compatible endpoint via models.json without code changes. Set the driver field to select which client implementation to use:

DriverProtocolUse For
openai-compatible (default)/v1/chat/completionsTogether AI, Groq, Fireworks, DeepSeek, any OpenAI-compatible API
ollama/api/chatSelf-hosted Ollama instances on custom ports
anthropic/v1/messagesAnthropic-compatible endpoints

Example — adding Together AI with no code changes:

{
"providers": [
{
"name": "together-ai",
"type": "custom",
"driver": "openai-compatible",
"url": "https://api.together.xyz/v1",
"default_model": "meta-llama/Llama-3-70b",
"is_local": false,
"api_key": "vault:together/key"
}
]
}

If driver is omitted, it defaults to openai-compatible.

All providers implement the ModelProvider trait:

#[async_trait]
pub trait ModelProvider: Send + Sync {
async fn chat_with_tools(
&self,
messages: &[ChatMessage],
tools: &[ToolDef],
) -> Result<ChatResponse>;
async fn chat_streaming(
&self,
messages: &[ChatMessage],
tools: &[ToolDef],
tx: mpsc::Sender<StreamEvent>,
) -> Result<()>;
async fn health_check(&self) -> Result<bool>;
fn model_name(&self) -> &str;
}

OllamaProvider, OpenAICompatProvider, and AnthropicProvider all implement this trait. For standard OpenAI-compatible endpoints, use "type": "custom" in models.json — no code changes needed. For providers with non-standard APIs, implement the ModelProvider trait in Rust.

ProviderRouter::from_config() reads models.json and instantiates the correct provider for each entry:

let config = ModelsConfig::load(".agentzero/models.json")?;
let router = ProviderRouter::from_config(&config)?;
// Router tries providers in priority order with fallback
let response = router.chat_with_tools(&messages, &tools).await?;

Configure in .agentzero/settings.toml:

[general]
default_provider = "ollama"
default_model = "llama3.2"

These are used when CLI flags are at their defaults.