Ollama MCP server

Communityhyzhak (community)Config last verified Jun 1, 2026

Maintained Ollama MCP server: pull, run, and chat with local LLMs, manage models, and call an OpenAI-compatible chat API on a private Ollama instance.

Ollama runs open language models — Llama, Mistral, Gemma, Qwen, DeepSeek, and many more — entirely on your own machine, with no data leaving the host. This MCP server (the actively maintained ollama-mcp-server package on npm, a rebooted fork that tracks the modern Ollama API) gives an agent full control of that local model runtime through tool calls, so a coding assistant can pull a model, run a prompt, hold a multi-turn chat, or manage the model library without you touching the Ollama CLI.

It exposes nine tools that map directly onto Ollama's API: pull and push to move models to and from registries, run to execute a one-shot prompt against a model (with optional image input for vision models), chat_completion for an OpenAI-compatible multi-turn conversation supporting system/user/assistant roles and an optional think parameter for reasoning models, create to build a custom model from a Modelfile, copy to duplicate a model, remove to delete one, list to enumerate installed models, and show to view a model's details. The server runs locally over stdio with npx and talks to Ollama at http://127.0.0.1:11434 by default, configurable via the OLLAMA_HOST environment variable so you can point it at a remote Ollama box on your network. Streaming is not supported in stdio mode.

Quick install

Copy-paste configs are provided for all 8 supported clients. Pick your client below.

Add to ~/.claude.json

~/.claude.json
json
{
  "mcpServers": {
    "ollama": {
      "command": "npx",
      "args": [
        "-y",
        "ollama-mcp-server"
      ],
      "env": {
        "OLLAMA_HOST": "<OLLAMA_HOST>"
      }
    }
  }
}
Or via CLI
bash
claude mcp add ollama -- npx -y ollama-mcp-server

Available tools

ToolDescription
pullDownload a model from an Ollama-compatible registry to the local Ollama instance.
pushUpload a local model to a registry.
runRun a model with a prompt and return its output; supports optional image input for vision-capable models.
chat_completionOpenAI-compatible chat API supporting system, user, and assistant roles, multimodal messages, and an optional think parameter for step-by-step reasoning.
createBuild a custom model from a Modelfile definition.
copyDuplicate an existing model under a new name.
removeDelete a model from the local Ollama instance.
listList all models currently available on the Ollama instance.
showShow detailed information about a specific model, including its parameters, template, and license.

Required configuration

  • OLLAMA_HOSTOptional

    Base URL of the Ollama API. Defaults to http://127.0.0.1:11434.

What you can do with it

Drive local models privately from your agent

Ask the agent to pull a model and answer a prompt with it — pull fetches the weights, then run or chat_completion executes inference entirely on your hardware. Nothing is sent to a third-party API, which is ideal for sensitive code or offline work.

Manage your local model library

Use list and show to inspect what's installed and how each model is configured, create to bake a system prompt and parameters into a reusable custom model via a Modelfile, and copy or remove to keep the library tidy — all without leaving the conversation.

FAQ

Is it free?
Yes. The MCP server is open source (MIT) and free, and Ollama itself is free and runs models locally, so there are no per-token API charges — you only need the hardware to run the models.
Does it support remote/OAuth?
There is no OAuth. The server runs locally over stdio with npx ollama-mcp-server. It connects to Ollama at http://127.0.0.1:11434 by default, but you can set OLLAMA_HOST to reach an Ollama instance on a remote machine over your own network.
← Browse all ai-ml servers