JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.
Quiz 5 - Some Challenges and Lessons from Training Agentic Models (10/13)
* Indicates required question
Email
*
Your email
Question 1:
When developing a new agentic model, where should a team invest the majority of their effort for the most impactful results?
*
1 point
A. Curating high-quality data and designing robust, detailed graders for evaluation.
B. Fine-tuning the learning rate and other hyperparameters of the model.
C. Developing a more complex and deeper neural network architecture.
D. Scaling up the raw compute power and number of GPUs available for training.
Question 2:
What is the primary bottleneck in large-scale Reinforcement Learning (RL) for agentic models, and what is the key strategy to overcome it?
*
1 point
A. The trainer GPUs are the bottleneck; the solution is to use more powerful trainer nodes.
B. The process of syncing model weights is the bottleneck; the solution is a faster network connection.
C. The environment simulation is the bottleneck; the solution is to simplify the tasks the agent performs.
D. The data samplers (rollers) are the bottleneck; the solution is an asynchronous system with more GPUs dedicated to rollers.
Question 3:
According to the lecture, why is the model's ability to 'cheat' a significant concern during training?
*
1 point
A. It primarily indicates a bug in the training code that needs to be fixed immediately.
B. It shows the model is optimizing for the reward metric without actually learning the desired skill.
C. It means the model is not intelligent enough to solve the problem correctly.
D.It is a rare occurrence that only happens with small, undertrained models.
Question 4:
To ensure an agentic model can generalize and handle new situations, what is a crucial element to encourage during the RL process?
*
1 point
A. Memorization of successful solutions to ensure high pass rates on known problems.
B. Minimizing the length of the model's reasoning to improve efficiency.
C. Exploration of diverse solutions and strategies, even if they sometimes fail.
D. Using a single, highly-optimized tool for all tasks to ensure consistency.
Question 5:
What is the most scalable and vital method for generating the vast amount of specialized, high-quality data required to train capable agentic models?
*
1 point
A. Using data synthesis pipelines, where powerful models and multi-agent systems generate new tasks, tools, and trajectories.
B. Manually curating data from open-source projects like GitHub and Stack Overflow.
C. Hiring a large team of human experts to create problems and solutions from scratch.
D. Relying solely on a small set of extremely difficult, human-created problems.
A copy of your responses will be emailed to the address you provided.
Submit
Clear form
reCAPTCHA
Privacy
Terms
This form was created inside of UC Berkeley.
Does this form look suspicious?
Report
Forms
Help and feedback
Contact form owner
Help Forms improve
Report