Term

Fine-tune cost

The full price of fine-tuning an LLM — GPU-hours during training, host storage of the resulting weights or adapter, and per-token premium at inference time. As of 2026 fine-tuning is rarely needed; frontier models with good system prompts cover most use cases.

Background

For OpenAI gpt-4o-mini, fine-tuning on 100k tokens costs roughly $10-30 in training + a ~50 % surcharge on inference. For open-weights models via LoRA, the bottleneck is GPU rental, not provider fees: a 7B Llama LoRA fits on a single A100 (~$1-2/h) and trains in 2-6 hours on most tasks. Most teams discover post-training prompt engineering gets them 90 % of the way and skip fine-tuning entirely.