Ollama
Ollama
Ollama runs open models (Llama, Qwen, Mistral, …) locally and serves them over an OpenAI-compatible API. Declare it as a model server to drive agents against models on your own machine — no cloud key required.
Prerequisites
- Ollama installed and running: ollama.com/download,
then
ollama serve(defaulthttp://localhost:11434) - At least one model pulled, e.g.
ollama pull qwen2.5-coder - An OpenAI-protocol LLM tool — codex works directly; claude/gemini need a bridge (see Protocol compatibility)
Configuration
Declare the server in operator.toml (no API key needed for a local server):
[[model_servers]]
name = "ollama-local"
kind = "ollama"
base_url = "http://localhost:11434"
display_name = "Ollama (local)"
Then reference it from a delegator:
[[delegators]]
name = "codex-local-qwen"
llm_tool = "codex"
model = "qwen2.5-coder"
model_server = "ollama-local"
Listing models
Ollama enumerates its pulled models at /api/tags. Operator probes it for the
live list (which doubles as a reachability check):
# REST
GET /api/v1/model-servers/ollama-local/models # { reachable, models[], error? }
In the VS Code status tree, expand the ollama-local server to browse the models
you’ve pulled.
How env injection works
Ollama speaks the OpenAI protocol, so when a delegator resolves to it Operator
exports OPENAI_BASE_URL=http://localhost:11434. A local server needs no key; if
you’ve put one behind a proxy, set api_key_env and it is injected by
reference. See the Model Providers overview for
the full mechanism.