LLM-as-a-judge = LLM judge = AI judge = using a non-deterministic, pre-trained model as a proxy for human judgment. We will use these terms somewhat interchangeably in this handbook.
Evals != LLM-as-a-judge. There are a number of methods and tools to evaluate an AI system, and LLM judges are just one of them.
We will use the term candidate model to make it clear when we are referring to a model or system that is being evaluated, and to disambiguate it from the judge.