Ensembles

Using model ensembles can be similar to parallel sampling, but strategically using each model's biases to cancel out and mitigate the overall effects of those biases. Generally speaking teams turn to ensembles when a task is really critical, and they're nervous that a single model will be too biased for that task, or want a hive-mind like intelligence.

This can no doubt be powerful, but the effect is often less increased consistency so much as simply adding more voices in the room. It can be hard to productionize and maintain the behavior of an ensemble, especially if using closed APIs that are subject to deprecation, time of day quantization, and similar.

We're not saying it's a bad approach. But it's generally not a great way to increase overall consistency so much as surface other useful properties of your task. For deployment-level tradeoffs, see routers and ensembles.