vibedonaldsvibedonalds.com
Term

Rubric

A structured grading scheme — usually a list of dimensions, each with explicit criteria — used by human graders or LLM-as-judge to score model outputs. The contract that makes an eval reproducible.

Background

A rubric specifies what 'good' means in measurable terms. A typical rubric has 3-7 dimensions, each scored 1-5 or pass/fail, with anchored examples. For a customer-support reply: accuracy, tone, action, length. Without a rubric, two graders disagree; with one, agreement rates rise to ~80-90 % on most tasks.