Evaluations
What are evals?
“The systematic measurement of properties in AI systems” - Apollo Research
“AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of AI systems.” https://www.lesswrong.com/tag/ai-evaluations
“There is not yet consensus on precisely what the term ‘evaluation’ entails [...]” Ada Lovelace Institute
Why do evals?
Examples
Self-proliferation. Jul 2023
Criteria METR used to create these tasks:
Situational Awareness. Jul 2024
Machiavelli. Behaviour in text-based games. Apr 2023
Adversarial robustness
Orgs doing evals
Risks / failure modes of evals. Questions to ask
Evals is a new field. Lots of low-hanging fruit
We need a Science of Evals by Apollo research
Opportunities 🎉