If you are building in AI, you have likely come across a lot of content and products around evals. However, there is not much useful information around LLM-as-a-judge, which underpins many, if not most, modern evals.
But LLM judges are not just for evals. AI models that make judgment calls, or decisions that would otherwise be handled by a human, unlock a massive amount of utility because they can be run at a cost and scale otherwise impossible to match with real people. But judges are only useful when they can make decisions as good as or better than the humans they are proxying.
This section will cover what judges are, where they're often used, and some principles around task design to use them effectively.