Quiz 4 - Agent Evaluation & Project Overview (10/6)
Email *
Question 1:  When designing a new benchmark for an AI agent, which of the following is the most critical principle to ensure the evaluation is meaningful? *
1 point
Question 2: Why would a research team prefer to use a 'dynamic benchmark' (like DynaBench or LiveCodeBench) over a 'static benchmark' (like MMLU)? *
1 point
Question 3: Evaluating an AI agent’s performance on a non-verifiable task—such as creative writing or summarizing a complex article—can be difficult. Which approach is commonly used to address this challenge?
*
1 point
Question 4: According to the principles of good benchmark design, a task like 'book a flight' is considered more realistic and valuable for evaluation than a task like 'solve this abstract logic puzzle'. Why is this?

*
1 point
Question 5: What is the primary function of an agent evaluation framework within the AI development lifecycle?

*
1 point
A copy of your responses will be emailed to the address you provided.
Submit
Clear form
reCAPTCHA
This form was created inside of UC Berkeley.

Does this form look suspicious? Report