Visual Dialog: Datasets and Models
Miquel Florensa
Group 6
February 19th 2025
Overview
Limitations of Supervised Learning: A Static Approach
Breaking Free: Reinforcement Learning for Dynamic Dialog
Paper 1: Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning (Das et al., 2017b)
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
The Q-BOT-A-BOT Architecture: Learning Through Self-Play
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
RL Details: Shaping Behaviour Through Rewards
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Q-BOT & A-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Q-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
LSTM
Are there any animals?
A-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
LSTM
A-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
LSTM
Q-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
LSTM
Q-BOT Architecture
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
MLP
Joint Training with REINFORCE
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Evaluation
~9.5k images test set of VisDial v0.5
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Evaluation
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Qualitative retrieval results on VisDial
The Problem: Are RL Agents Really Talking?
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning, Das et al. Mar. 2017
Q-BOT-A-BOT interactions for SL-pretrained and RL-full-QAf.
GuessWhich: A Cooperative Game for Evaluating Visual Dialog
Evaluating Visual Conversational Agents via Cooperative Human-AI Games, Chattopadhyay & Yadav. Aug. 2017
How GuessWhich Works: A Cooperative Image Hunt
Evaluating Visual Conversational Agents via Cooperative Human-AI Games, Chattopadhyay & Yadav. Aug. 2017
GuessWhich Interface
Why GuessWhich Matters: A Human-Centered Evaluation
GuessWhich addresses a critical gap in AI evaluation: it focuses on how AI agents impact human performance.
Evaluating Visual Conversational Agents via Cooperative Human-AI Games, Chattopadhyay & Yadav. Aug. 2017
Diversifying the Conversation: Encouraging More Engaging Dialog
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Formalizing Diversity: The Smooth L1 Penalty
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Improving Agent Conversations
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Quantifying Diversity: Measuring the Impact
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Improved Results
Improving Generative Visual Dialog by Answering Diverse Questions (Murahari et al., 2019)
Conclusions: Advancing the Frontier of Conversational AI
Conclusions: Thoughts on Visual Dialog Models
Questions?
Thank you!