Replicate MCP server
Replicate's official MCP server: discover, compare, and run thousands of hosted AI models — image, video, audio, and language — straight from your agent.
The Replicate MCP server is Replicate's official integration that puts its entire HTTP API behind an agent's tool calls. Replicate hosts thousands of community and official models — image generators like Flux and SDXL, video models, speech and music models, upscalers, and language models — each callable as a prediction. This server lets a coding agent search the catalog, read a model's schema, run a prediction, poll for the result, and manage trainings, deployments, files, and collections, all without leaving the editor or chat.
The recommended deployment is the hosted remote endpoint at https://mcp.replicate.com/sse, an SSE transport that runs an OAuth-style web flow: the first time you connect, you paste a Replicate API token into a browser page and the server stores it in Cloudflare KV, acting as a trusted intermediary so the token is never exposed to the model. You can also run it locally over stdio with npx replicate-mcp@latest and a REPLICATE_API_TOKEN environment variable. To keep the context window small, the server exposes a static set of common tools plus three dynamic meta-tools (list_api_endpoints, get_api_endpoint_schema, invoke_api_endpoint) that let the agent reach any endpoint in the API on demand.
Quick install
Copy-paste configs are provided for all 8 supported clients. Pick your client below.
Add to ~/.claude.json
{
"mcpServers": {
"replicate": {
"type": "http",
"url": "https://mcp.replicate.com/sse"
}
}
}claude mcp add --transport http replicate https://mcp.replicate.com/sseHeads up
- First tool call opens a browser to authorize.
Available tools
| Tool | Description |
|---|---|
| get_account | Return information about the user or organization associated with the provided API token. |
| list_collections | List the collections of models featured on Replicate, as a paginated list of collection objects. |
| get_collections | Get a single collection of models by slug, including the nested list of models in that collection. |
| list_hardware | List the available hardware SKUs (CPU and GPU types) for running models and trainings. |
| search_models | Get a list of public models matching a search query, ranked by relevance. |
| list_models | Get a paginated list of public models on Replicate. |
| get_models | Get the metadata for a public model by owner and name. |
| create_models | Create a new model on Replicate under your account or organization. |
| delete_models | Delete a model you own. The model must have no versions and no predictions. |
| get_models_readme | Get the README content (Markdown) for a model. |
| list_models_examples | List example predictions made using the model, useful for understanding typical inputs and outputs. |
| create_models_predictions | Create a prediction using an official model, passing the inputs you provide. |
| list_models_versions | List the versions of a model, sorted with the most recent version first. |
| get_models_versions | Get the metadata and input/output schema for a specific model version. |
| delete_models_versions | Delete a model version and all associated predictions, including all output files. |
| create_predictions | Create a prediction for the model version and inputs you provide. |
| get_predictions | Get the current state of a prediction, including outputs once it has completed. |
| list_predictions | Get a paginated list of all predictions created by the account associated with the API token. |
| cancel_predictions | Cancel a prediction that is currently running. |
| create_trainings | Start a new training of the model version you specify, to fine-tune a model. |
| get_trainings | Get the current state of a training. |
| list_trainings | Get a paginated list of all trainings created by the account associated with the API token. |
| cancel_trainings | Cancel a training that is currently running. |
| create_deployments | Create a new deployment with a chosen model version, hardware, and min/max instances for scaling. |
| get_deployments | Get information about a deployment by name, including its current release. |
| list_deployments | List the deployments associated with the current account, including the latest release configuration for each. |
| update_deployments | Update properties of an existing deployment, including hardware, min/max instances, and the underlying model version. |
| delete_deployments | Delete a deployment. |
| create_deployments_predictions | Create a prediction for the deployment and inputs you provide. |
| create_files | Upload a file to Replicate so it can be used as input to a prediction. |
| get_files | Get the details of a file you have uploaded. |
| list_files | Get a paginated list of all files created by the account associated with the API token. |
| download_files | Download a file by providing the file owner, access expiry, and a valid signature. |
| delete_files | Delete a file. Subsequent requests to the file resource return 404 Not Found. |
| get_default_webhooks_secret | Get the signing secret for the default webhook endpoint, used to verify webhook requests come from Replicate. |
| list_api_endpoints | Dynamic meta-tool: list or search every endpoint in the Replicate API by name, resource, operation, or tag. |
| get_api_endpoint_schema | Dynamic meta-tool: get the input schema for a named API endpoint so the agent can construct valid arguments. |
| invoke_api_endpoint | Dynamic meta-tool: invoke any Replicate API endpoint by name with the arguments matching its schema. |
Required configuration
- REPLICATE_API_TOKENRequired
Replicate API token from replicate.com/account/api-tokens. Required for the local stdio server; the remote server obtains it through the browser auth flow.
What you can do with it
Run image, video, and audio models from your agent
Ask the agent to generate an image with Flux or transcribe audio: it searches the catalog with search_models, reads the version schema, calls create_predictions, and polls get_predictions until the output URL is ready — no manual API wiring.
Manage fine-tunes and production deployments
Kick off a fine-tune with create_trainings, then stand up a scalable endpoint with create_deployments and run inference against it. The agent can also list and update deployments to tune hardware and instance counts.
FAQ
- Is it free?
- The MCP server is free and open source, but it calls Replicate's API, which is paid and billed per second of compute (or per run for some official models) against your REPLICATE_API_TOKEN. You only pay for the predictions and trainings you run.
- Does it support remote/OAuth?
- Yes. The recommended deployment is the remote SSE endpoint at https://mcp.replicate.com/sse, which uses a browser-based authentication flow — you paste your Replicate API token into a web page and the server stores it in Cloudflare KV, never exposing it to the model. You can also run it locally over stdio with npx replicate-mcp and a REPLICATE_API_TOKEN.