Beyond statistical learning in vision-language
Aditya Sharma and Aman Dalmia
Classical approach to VQA
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
Problems with VQA models - Ignoring the image
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
Can we use adversarial regularization to overcome language priors?
Ramakrishnan, S., Agrawal, A., & Lee, S. (2018). Overcoming language priors in visual question answering with adversarial regularization. Advances in Neural Information Processing Systems, 31.
Can we use adversarial regularization to overcome language priors?
Ramakrishnan, S., Agrawal, A., & Lee, S. (2018). Overcoming language priors in visual question answering with adversarial regularization. Advances in Neural Information Processing Systems, 31.
RUBi: Reducing Unimodal Biases for VQA
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
RUBi: Reducing Unimodal Biases for VQA
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
RUBi: Reducing Unimodal Biases for VQA
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
RUBi: Reducing Unimodal Biases for VQA
Cadene, R., Dancette, C., Cord, M., & Parikh, D. (2019). Rubi: Reducing unimodal biases for visual question answering. Advances in neural information processing systems, 32.
Problems with VQA models - Linguistic Diversity
Kant, Y., Moudgil, A., Batra, D., Parikh, D., & Agrawal, H. (2021). Contrast and classify: Training robust vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1604-1613)
Are models robust to linguistic variations?
Are models capable of handling linguistic diversity?
Can models cope with different linguistic forms reliably?
ConClaT: Contrastive Learning + Cross Entropy Loss
Kant, Y., Moudgil, A., Batra, D., Parikh, D., & Agrawal, H. (2021). Contrast and classify: Training robust vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1604-1613)
Contrastive loss
Encourages representations to be robust to linguistic variations
Cross-entropy loss
Preserves the discriminative power of representations
ConClaT: Contrastive Learning + Cross Entropy Loss
Kant, Y., Moudgil, A., Batra, D., Parikh, D., & Agrawal, H. (2021). Contrast and classify: Training robust vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1604-1613)
ConClaT: Contrastive Learning + Cross Entropy Loss
Kant, Y., Moudgil, A., Batra, D., Parikh, D., & Agrawal, H. (2021). Contrast and classify: Training robust vqa models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1604-1613)
ConClaT: Contrastive Learning + Cross Entropy Loss
Thoughts: How would I solve this problem today?
Using Chain-of-Thought to help the model think and reason about why it is making the prediction
What animals are in this picture?
Cycle-Consistency for Robust Visual Question Answering - Shah et al., 2019
VQA Models are amazing
Image Source: https://arxiv.org/pdf/1612.00837.pdf
VQA Models are brittle
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Predictions from Pythia VQA 2018 Challenge Winner
Evaluating and benchmarking models for robustness
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Cycle consistent training scheme
Image Source: https://arxiv.org/pdf/1902.05660.pdf
I
Q
A`
VQA Model
Visual Question Generation Module
Q`
A``
I
I
VQA Model
Cycle consistent training scheme
Image Source: https://arxiv.org/pdf/1902.05660.pdf
I
Q
A`
VQA Model
Visual Question Generation Module
Q`
A``
I
I
VQA Model
A
VQA Loss
Cycle consistent training scheme
Image Source: https://arxiv.org/pdf/1902.05660.pdf
I
Q
A`
VQA Model
Visual Question Generation Module
Q`
A``
I
I
VQA Model
Question Consistency Loss
Cycle consistent training scheme
Image Source: https://arxiv.org/pdf/1902.05660.pdf
I
Q
A`
VQA Model
Visual Question Generation Module
Q`
A``
I
I
VQA Model
A
VQA Loss
Answer Consistency Loss
Cycle consistent training scheme
Image Source: https://arxiv.org/pdf/1902.05660.pdf
I
Q
A`
VQA Model
Visual Question Generation Module
Q`
A``
I
I
VQA Model
A
VQA Loss
Answer Consistency Loss
Question Consistency Loss
VQA Rephrasing
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Evaluation Only
Based on VQA v2 Validation
Human Collected
VQA Rephrasing
Image Source: https://arxiv.org/pdf/1902.05660.pdf
40K Images
160K Questions
4 Rephrasing
Does cycle consistent training make models robust?
Image Source: https://arxiv.org/pdf/1902.05660.pdf
New SOTA
Does cycle consistent training make models robust?
Image Source: https://arxiv.org/pdf/1902.05660.pdf
New SOTA
Previous SOTA
Does cycle consistent training improve VQA performance ?
Image Source: https://arxiv.org/pdf/1902.05660.pdf
New SOTA
Previous SOTA
Does cycle consistent training improve question generation ?
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Does cycle consistent models better predict their failures ?
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Conclusion
Image Source: https://arxiv.org/pdf/1902.05660.pdf
Thoughts / Insights
Questions?
Thank you!