What is MCP sampling?
Sampling is a Model Context Protocol feature that lets a server request a completion from the client's language model, so the server can use the model's reasoning without holding its own API key.
Sampling inverts the usual direction of the Model Context Protocol: instead of the model calling the server, the server asks the client to run an LLM completion on its behalf. A server that needs a model, say to summarize a document it just fetched, classify some data, or decide a next step, sends a sampling/createMessage request to the client, which controls the actual model. The client advertises sampling support during capability negotiation, and because it owns the model and the relationship with the user, it stays in charge: the spec puts a human in the loop, letting the user review or modify the prompt before it runs and inspect the result before it returns. This design means servers can embed model-powered behavior without bundling their own provider credentials, model choice, or billing; the host decides which model fulfills the request. Sampling is one of the more advanced capabilities and is unevenly implemented across clients today, so server authors typically treat it as optional and degrade gracefully when a client does not support it.