Model Servers

A model server is a named host that serves models via an inference API. It’s orthogonal to the LLM tool that runs your coding agent:

A delegator pairs an LLM tool with a model (and, optionally, a model server).

The three-layer hierarchy

┌─ llm_tools ─────────┐   ┌─ model_servers ──────┐
│ claude  (detected)  │   │ anthropic-api (impl.)│
│ codex   (detected)  │   │ openai-api    (impl.)│
│ gemini  (detected)  │   │ google-api    (impl.)│
│                     │   │ ollama-local  (user) │
└─────────────────────┘   └──────────────────────┘
            ▲                        ▲
            │                        │
            └───── delegators ───────┘
   name, llm_tool, model, model_server (optional)

Implicit builtins

You don’t need to declare a model server for the vendor-default path. Every detected LLM tool has an implicit builtin:

llm_tool implicit model_server
claude anthropic-api
codex openai-api
gemini google-api

Delegators that omit model_server resolve to these builtins automatically. Existing configs keep working unchanged.

Kinds

kind Use for
anthropic-api Anthropic Console / a compatible proxy (bridge for local models)
openai-api OpenAI / a compatible proxy
google-api Google Gemini API
ollama Local ollama server (ollama serve, default http://localhost:11434)
openai-compat Any OpenAI-API-compatible server (vllm, lmstudio, together.ai, groq, …)
lmstudio LM Studio’s local server

Declaring a model server

Edit operator.toml (or create a delegator via the REST API / VS Code status tree):

[[model_servers]]
name = "ollama-local"
kind = "ollama"
base_url = "http://localhost:11434"
display_name = "Ollama (local)"

Then reference it from a delegator:

[[delegators]]
name = "codex-local-qwen"
llm_tool = "codex"
model = "qwen2.5-coder"
model_server = "ollama-local"

Ad-hoc CLI usage

# Named delegator (recommended for repeatable runs)
operator launch --delegator codex-local-qwen

# Ad-hoc overrides (for one-off experiments)
operator launch \
  --llm-tool codex \
  --model qwen2.5-coder \
  --model-server ollama-local

--delegator and the ad-hoc trio (--llm-tool, --model, --model-server) are mutually exclusive.

Protocol compatibility

llm_tool ollama-compatible? Notes
codex Yes, directly Codex speaks OpenAI API; ollama exposes /v1 out of the box.
claude Only via bridge Claude CLI speaks Anthropic protocol. Run claude-code-router (or similar) at a port and point base_url at that bridge with kind = "anthropic-api".
gemini Only via bridge Same story as claude; use litellm-proxy or similar.

REST API

GET    /api/v1/model-servers         # list (declared + implicit builtins)
GET    /api/v1/model-servers/{name}  # fetch by name
POST   /api/v1/model-servers         # create
DELETE /api/v1/model-servers/{name}  # delete (implicit builtins are protected)

What ships in this release

This release lays down the infrastructure:

What’s explicitly deferred:

Those ship in the next release. In the meantime: declare your model server, attach it to a delegator, and set the appropriate *_BASE_URL env var in your shell before invoking operator — the spawned agent inherits it.