Two new RFPs on the real-world impacts of LLMs
Our new RFPs
Motivation (pt 1)
Motivation (pt 2)
RFP 1: LLM Agent Benchmarks
Diagram from Kinniment et al., 2023
Our priorities: The Three “C”s
Our priorities: The Three “C”s
Suleyman suggests the “make a million dollars” benchmark
Our priorities: The Three “C”s
Some benchmark ideas
Why prioritize LLM agent benchmarks?
RFP 2: Everything else
Out of scope for both RFPs
What about the capabilities externalities?