Xu et al. Meta + NYU&UoW
Related work redux
MetaCLIP
A synset(synonym set) is a set of words with the same part-of-speech that can be interchanged in a certain context
What does this even do?
2 Pools of data
EKSPAČI
Zero-shot imagenet believe it or not - pool 1
Pool 2 shenanigans
A.1 They tried using datacomp data - it wasn’t “good”
400M vs 1B
A.2 Curation deets
A.3 Human study on the effects on curation
We collect an evaluation set of 100 random image-text pairs for balanced and unbalanced data, respectively, and ask annotators to score on the image, text, and pair quality metrics, separately, on a scale of 1 to 5
Altogether: Image Captioning via Re-aligning Alt-text @EMNLP24