Best MCP servers for creative media
Generating and editing media, images, video, audio, and voice, used to mean stitching together separate APIs and SDKs. With MCP, an agent can run a model, iterate on the result, and chain steps (generate an image, upscale it, then narrate a voiceover) all in one conversation. The right server depends on what you make, a multi-model aggregator that covers many modalities, a fast image-and-video generation platform, a dedicated diffusion image toolkit, a vector-and-raster design tool, or a voice and speech engine. Because creative work is iterative, servers that support edit, upscale, and restyle operations matter as much as pure generation. The servers below cover the common creative modalities, each a real MCP server with a verified, current install config.
Replicate
Replicate
Replicate's official MCP server: discover, compare, and run thousands of hosted AI models — image, video, audio, and language — straight from your agent.
Replicate's official server discovers, compares, and runs thousands of hosted models across image, video, audio, and language, the broadest single entry point for creative generation.
fal.ai
Raveen Beemsingh
Community MCP server for fal.ai: generate and edit images, video, music, and audio with 600+ fast generative models from your agent.
A community fal.ai server generates and edits images, video, music, and audio with 600+ fast generative models, built for speed and breadth across modalities.
Stability AI
Tadas Antanavicius
Community MCP server for Stability AI: generate, edit, upscale, outpaint, and restyle images with Stable Diffusion from your agent.
A community Stability AI server generates, edits, upscales, outpaints, and restyles images with Stable Diffusion, the dedicated diffusion toolkit for image work.
Recraft
Recraft
Recraft's official MCP server: generate and edit raster and vector images, build reusable styles, vectorize, upscale, and swap backgrounds from your agent.
Recraft's official server generates and edits raster and vector images, builds reusable styles, vectorizes, and upscales, the design-focused pick for brand-consistent visuals.
ElevenLabs
ElevenLabs
ElevenLabs' official MCP server: text-to-speech, voice cloning, speech-to-text, sound effects, and conversational AI agents from your editor.
ElevenLabs' official server handles text-to-speech, voice cloning, speech-to-text, sound effects, and conversational AI, the strongest option for the audio and voice side.