Model Servers
A model server is a named host that serves models via an inference API. It’s orthogonal to the LLM tool that runs your coding agent:
- LLM tools (claude, codex, gemini) are the agentic CLIs that drive the coding session — they use tools, edit files, resume sessions.
- Model servers are where the model weights live — Anthropic’s API, OpenAI’s API, Google’s API, or a local/alt host like ollama, lmstudio, or vllm.
A delegator pairs an LLM tool with a model (and, optionally, a model server).
The three-layer hierarchy
┌─ llm_tools ─────────┐ ┌─ model_servers ──────┐
│ claude (detected) │ │ anthropic-api (impl.)│
│ codex (detected) │ │ openai-api (impl.)│
│ gemini (detected) │ │ google-api (impl.)│
│ │ │ ollama-local (user) │
└─────────────────────┘ └──────────────────────┘
▲ ▲
│ │
└───── delegators ───────┘
name, llm_tool, model, model_server (optional)
Implicit builtins
You don’t need to declare a model server for the vendor-default path. Every detected LLM tool has an implicit builtin:
| llm_tool | implicit model_server |
|---|---|
claude |
anthropic-api |
codex |
openai-api |
gemini |
google-api |
Delegators that omit model_server resolve to these builtins automatically. Existing configs keep working unchanged.
Kinds
kind |
Use for |
|---|---|
anthropic-api |
Anthropic Console / a compatible proxy (bridge for local models) |
openai-api |
OpenAI / a compatible proxy |
google-api |
Google Gemini API |
ollama |
Local ollama server (ollama serve, default http://localhost:11434) |
openai-compat |
Any OpenAI-API-compatible server (vllm, lmstudio, together.ai, groq, …) |
lmstudio |
LM Studio’s local server |
Declaring a model server
Edit operator.toml (or create a delegator via the REST API / VS Code status tree):
[[model_servers]]
name = "ollama-local"
kind = "ollama"
base_url = "http://localhost:11434"
display_name = "Ollama (local)"
Then reference it from a delegator:
[[delegators]]
name = "codex-local-qwen"
llm_tool = "codex"
model = "qwen2.5-coder"
model_server = "ollama-local"
Ad-hoc CLI usage
# Named delegator (recommended for repeatable runs)
operator launch --delegator codex-local-qwen
# Ad-hoc overrides (for one-off experiments)
operator launch \
--llm-tool codex \
--model qwen2.5-coder \
--model-server ollama-local
--delegator and the ad-hoc trio (--llm-tool, --model, --model-server) are mutually exclusive.
Protocol compatibility
| llm_tool | ollama-compatible? | Notes |
|---|---|---|
codex |
Yes, directly | Codex speaks OpenAI API; ollama exposes /v1 out of the box. |
claude |
Only via bridge | Claude CLI speaks Anthropic protocol. Run claude-code-router (or similar) at a port and point base_url at that bridge with kind = "anthropic-api". |
gemini |
Only via bridge | Same story as claude; use litellm-proxy or similar. |
REST API
GET /api/v1/model-servers # list (declared + implicit builtins)
GET /api/v1/model-servers/{name} # fetch by name
POST /api/v1/model-servers # create
DELETE /api/v1/model-servers/{name} # delete (implicit builtins are protected)
What ships in this release
This release lays down the infrastructure:
- Data model and config schema
- REST CRUD endpoints
- TUI and VS Code status tree sections
operator launch --model-server <name>flag (validated, resolved through the normal delegator path)
What’s explicitly deferred:
- Automatic ollama detection during
operator setup - Environment-variable injection on spawn (
OPENAI_BASE_URL=…etc.) - Full walkthroughs for wiring up claude/gemini via a bridge
- Bundled bridge binaries
Those ship in the next release. In the meantime: declare your model server, attach it to a delegator, and set the appropriate *_BASE_URL env var in your shell before invoking operator — the spawned agent inherits it.