Avishkar Bhoopchand
Google DeepMind, Deep Learning Indaba�13 February 2025�MENA ML - Doha, Qatar
AI For Education
The challenges and opportunities
Talk Outline
Introduction
UN SDG 4
Ensure inclusive and equitable quality education and promote lifelong learning opportunities for all.
Bloom, B. 1984. The 2-sigma Problem: The Search for Methods of Group Instruction as Effective and one-to-one Tutoring
The most effective teaching method is also not possible to scale
Or is it..?
Enter Generative AI
Students are early adopters
Survey usage stats:
Digital Education Council Global AI Student Survey 2024
So is ChatGPT enough…?
LLMs convey information
They are tuned to be helpful.
This is not the same as learning
Creates a false sense of mastery
Lehmann M et al. 2024. AI Meets the Classroom: When Does ChatGPT Harm Learning?
The Scale & Complexity of the Problem
Education for all
1.5 billion students worldwide each with different:
Not just for Amira in her maths class…
The elementary student learning to read
The high school student tackling physics
The adult learning a new language
The vocational student mastering a trade
Complexity of 1:1 Tutoring
Immediate Feedback
Immediate Feedback
Infers students knowledge
Immediate Feedback
Infers students knowledge
Adjust explanations to suit the student
Immediate Feedback
Infers students knowledge
Adjust explanations to suit the student
Builds Rapport and Trust
Immediate Feedback
Infers students knowledge
Adjust explanations to suit the student
Builds Rapport and Trust
Adjust to emotional & cognitive state
Pedagogy Principles
Encourage Active Learning
Deepen Metacognition
Manage Cognitive Load
Motivate & Stimulate Curiosity
Adapt to learner’s goals & needs
Jurenka et al. 2024. Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach
An AI Tutor needs to…
Adhere to pedagogical principles
Model the full space of human knowledge
Understand how knowledge builds upon itself
Adapt to individual learning trajectories
Maintain engagement over months and years
Do all this while being robust, fair, and explainable
AI is set to disrupt Education
Haven’t we heard this all before?
MOOCs were also set to disrupt education:
The Reality…
MOOCs have very low completion rates: < 10%
Those who succeed are typically already educated, often with post-graduate degrees and affluent
They are already skilled at self-directed learning
Sustainable business models have proved elusive
The Lessons
Technology often amplifies rather than bridge existing educational divides�We need to design explicitly for struggling learners
Learning infrastructure is as important as content�Design with community, feedback, accountability in mind
Tools for independent learning work best for those who need them the least�Self-learning is a complex metacognitive skill that needs to be developed
Education is deeply embedded in social and cultural contexts�One-size-fits-all solutions typically fail
Technology alone is not the answer
Sociotechnical systems
Stakeholders
Community Impact
Power Dynamics
Institutional practices
Cultural Norms
Policy & Governance
Demo - LearnLM
Jurenka et al. 2024. Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach
LearnLM Team, Google. 2024. LearnLM: Improving Gemini for Learning
Evaluation
Which outcome is better?
🧑🏽🎓
Gets perfect test scores but can’t apply concepts to a new problem
👨🏻🎓
Scores lower but develops a deep understanding
🧑🏽🎓
Learns slowly, but develops strong study habits
👨🏻🎓
Makes quick progress, but becomes dependent on ChatGPT
What to measure?
Test scores, completion rates, engagement time
But this is not the full picture
Need to consider:
We need: standard protocols, longitudinal studies, metrics for understanding, measurements of secondary effects
Types of Evaluation
Extrinsic
Measures actual educational outcomes and impact
Better for strategic decisions
Intrinsic
Measure performance on specific tasks or capabilities
Good for rapid iteration
1 year
1 month
1 week
1 day
1 hour
Automatic
Side by side (conversation)
(Simulated) learner ratings
Side by side (turn level)
2 stage pedagogy ratings
Effectiveness studies
Qualitative studies
User Feedback
Longitudinal Studies
Strategise
Iterate
1 year
1 month
1 week
1 day
1 hour
Automatic
Side by side (conversation)
(Simulated) learner ratings
Side by side (turn level)
2 stage pedagogy ratings
Effectiveness studies
Qualitative studies
User Feedback
Longitudinal Studies
Strategise
Iterate
Automatic Evaluations
What are they?
Why are they needed?
Challenges
Automatic Evaluations
Break down the task
Carefully craft scenarios
Give the model context
Scenario-based supporting info
1 year
1 month
1 week
1 day
1 hour
Automatic
Side by side (conversation)
(Simulated) learner ratings
Side by side (turn level)
2 stage pedagogy ratings
Effectiveness studies
Qualitative studies
User Feedback
Longitudinal Studies
Strategise
Iterate
LearnLM Team, Google. 2024. LearnLM: Improving Gemini for Learning
1 year
1 month
1 week
1 day
1 hour
Automatic
Side by side (conversation)
(Simulated) learner ratings
Side by side (turn level)
2 stage pedagogy ratings
Effectiveness studies
Qualitative studies
User Feedback
Longitudinal Studies
Strategise
Iterate
TutorCopilot: A Randomised Controlled Trial
A “copilot” for online tutors
Size: 900 tutors; 1800 students
Metric: Exit tickets
Result: Significant increase in scores in treatment group
Larger effect for less effective, and less �experienced tutors
(But still far from 2-sigma!)
Wang R et al., 2024. Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise
TutorCopilot: A Randomised Controlled Trial
Qualitative analysis: higher quality strategies in treatment group
User feedback
Taxonomy of Choices (Intrinsic Evals)
Real Learners
Single-turn
Unguided
Advanced Learner
Human
Learner
Single-Turn
Pointwise
Role-playing experts
Multi-turn
Scenario Guided
Novice Learner
Automatic
Educator
Conversation-level
Pairwise (side by side)
Data Collection
Ratings
Technical & Research Challenges in AI for Education
The Challenges from an AI perspective
Multi-modal interaction (visual, verbal, written)
Real-time state estimation (student understanding)
Dynamic task selection (what to teach next)
Natural language generation (explanations)
Causal reasoning (why did the student make this mistake?)
Long-term planning (curriculum design)
All while handling partial observability and delayed feedback
Knowledge Representation & Tracing
Questions:
Possible directions:
Theory of Mind
Questions:
Possible directions:
Personalisation
Questions:
Possible directions:
Content Generation
Questions:
Possible directions:
Interaction
Questions:
Possible directions:
Optimising for Learning Outcomes
Questions:
Possible directions:
Ethical and Safety Considerations
Privacy and Data Protection
Data needed for personalized and effective experience
Education data can be sensitive; may involve work with minors
Case Study:
New York Times. 2014. InBloom Student Data Repository to Close
Equity & Access
Widening the educational gap and digital divide
LLMs tend to be Western-centric and don’t perform as well in the languages and contexts of underrepresented regions
E.g. access to online learning during COVID-19 pandemic lockdowns
Dependency
Risk of children developing emotional dependency on AI tutors
Striking a balance between being helpful and making students dependent on the technology
Anthropomorphism and self-disclosure
Safety
Safety considerations look different
Discussion of otherwise sensitive topics may be allowed in educational settings for development of critical thinking skills, learning about history etc
Conflict between good pedagogical practice and safety
Conclusion
Summary
Global challenge in quality education for all
Summary
Global challenge in quality education for all
Hugh opportunity to leverage AI for social impact
Summary
Global challenge in quality education for all
Hugh opportunity to leverage AI for social impact
Current GenAI is not suitable for this task
Summary
Global challenge in quality education for all
Hugh opportunity to leverage AI for social impact
Current GenAI is not suitable for this task
Problem is more difficult than it sounds!
Summary
Global challenge in quality education for all
Hugh opportunity to leverage AI for social impact
Current GenAI is not suitable for this task
Problem is more difficult than it sounds!
Many exciting research opportunities
Summary
Global challenge in quality education for all
Hugh opportunity to leverage AI for social impact
Current GenAI is not suitable for this task
Problem is more difficult than it sounds!
Many exciting research opportunities
Join us in this journey!
The End
Thank You�—
شكراً