Scaling Laws for Image Captioning
By Balaji Balasubramanian and Eshwanth Baskaran
What are scaling laws?
Kaplan et al. - Scaling Laws for Neural Language Models
What is transfer learning?
https://www.pinterest.com/pin/424745808604824736/
Image Captioning
Mokady, R., Hertz, A., & Bermano, A. H. (2021). ClipCap: CLIP Prefix for Image Captioning. arXiv. https://doi.org/10.48550/ARXIV.2111.09734
Image Captioning
Mokady, R., Hertz, A., & Bermano, A. H. (2021). ClipCap: CLIP Prefix for Image Captioning. arXiv. https://doi.org/10.48550/ARXIV.2111.09734
Image and Caption
Training Objective
Training Objective for Autoregressive Language Model
Image Captioning
http://cs231n.stanford.edu/2021/slides/2021/lecture_10.pdf
ClipCap for Image Captioning
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. arXiv. https://doi.org/10.48550/ARXIV.2103.00020
https://jalammar.github.io/illustrated-gpt2/
ClipCap
ClipCap
Mapping network:
Training:
Our study
Our study - Scaling law with dataset size
Our study - Scaling law with #model parameters
Our study - Scaling law with #model parameters
Hyperparameters tuned are:-
The best model had 4 transformer layers and prefix length of 10.
Model performance
Model | BLEU | METEOR |
Ours (mapping network) | 23.44 | 19.2 |
Pred: A woman is sitting with a fire hydrant.
Ref: A lady sitting beside a fire hydrant with hand on head.
Pred: A woman is standing on a boat with fruit.
Ref: An Asian woman with some vegetables in her boat
Summary