Term

LoRA

Low-Rank Adaptation — a parameter-efficient fine-tuning method that trains small rank-decomposed matrices instead of full model weights, cutting GPU memory and storage by 10–100×. Standard in 2026 for fine-tuning open-weight models.

Background

LoRA freezes the base model and inserts trainable rank-decomposition matrices into specific layers (usually attention projections). Training updates only ~0.1–1 % of the parameters, fitting in consumer GPU memory. The result is a small adapter file (often <100 MB) that can be merged into the base or hot-swapped at inference. QLoRA combines LoRA with 4-bit quantisation for further memory savings. Adapter hubs (Hugging Face) host thousands of community-trained LoRAs.