1. According to Oriol Vinyals, why was Imitation Learning (IL) or pre-training a necessary first step before applying Reinforcement Learning (RL) in the AlphaStar project?
3. Vinyals notes that the compute envelope for modern LLMs is heavily skewed toward pre-training (IL). What does he argue this imbalance prevents in the current LLM training paradigm?
4. What is the key difference when applying multi-agent reinforcement learning techniques (like the League) from a competitive game like StarCraft to collaborative real-world LLM agent tasks (e.g., a math tutor)?
5. How did the architecture of AlphaStar's action space relate to modern LLM function calling?
Does this form look suspicious? Report