What is fine-tuning?

Fine-tuning further trains a pretrained LLM on a curated dataset to specialize its behavior, style, or domain knowledge, baking patterns into the weights instead of supplying them in every prompt.

Fine-tuning takes a model that was pretrained on broad data and continues training it on a smaller, task-specific dataset so its weights shift toward your domain, format, or tone. Unlike prompting, which steers the model at inference time with instructions and examples, fine-tuning changes the model itself, so the learned behavior applies without spending prompt tokens on it. Parameter-efficient methods like LoRA train only a small set of adapter weights rather than the whole network, making fine-tuning far cheaper and producing portable adapters. Fine-tuning shines when you have many high-quality examples, need a consistent output style, want to compress a long few-shot prompt into the weights to save tokens at scale, or must capture a niche format the base model handles poorly. It is a poor fit for injecting fresh or frequently-changing facts, retrieval-augmented generation and tool use (including MCP) are the right tools there, because facts live in data the model reads, not in weights you must retrain. Fine-tuning carries costs: dataset curation and labeling, a training run, evaluation to catch regressions and catastrophic forgetting, and ongoing maintenance as base models improve. A common, pragmatic ladder is to exhaust prompting, few-shot examples, and RAG first, and reach for fine-tuning only when those plateau.