CVPR 2024
MLV Lab
Korea University
Retrieval-Augmented
Open-Vocabulary Object Detection
Jooyeon Kim1,* Eulrang Cho2,* Sehyung Kim1 Hyunwoo J. Kim1
1Korea University 2Samsung Research
Motivation
CVPR 2024
MLV Lab
Korea University
Radford, Alec, et al. "Learning transferable visual models from natural language supervision." ICML, 2021.
Gao, Mingfei, et al. "Open vocabulary object detection with pseudo bounding-box labels." ECCV, 2022.
Vision-Language Model
Pseudo-labeling
Utilize vocabulary sets more diversely
Motivation
CVPR 2024
MLV Lab
Korea University
‘jaguar’
cat
bottle
Similar:
Dissimilar:
Vocabulary Set
LLM
alligator
iPod
sock
handheld music player
worn on the feet
sharp teeth
strong tail
Vocabulary Set
Concepts
Retrieval-Augmented Losses and visual Features (RALF)
RALF
CVPR 2024
MLV Lab
Korea University
Retrieval-Augmented Losses (RAL): retrieve negative vocabularies and augment loss function
Retrieval-Augmented visual Features (RAF): augment visual features using verbalized concepts
RALF
CVPR 2024
MLV Lab
Korea University
Retrieval-Augmented Losses (RAL): retrieve negative vocabularies and augment loss function
Ground-Truth Box
Backbone
RAL
RoI Head
RPN
Training pipeline w/RAL
Ground-Truth Label
Mean
Mean
‘broccoli’
Vocabulary Store
Ground-Truth
Box Embedding
Negative
Retriever
Text Encoder
Retrieved negative vocabularies
Hard Negative
lettuce
avocado
green beans
Easy Negative
greylag
carillonneur
trouser press
RALF
CVPR 2024
MLV Lab
Korea University
Retrieval-Augmented visual Features (RAF): augment visual features using verbalized concepts
LLM
Extract
Noun Chunks
Vocabulary Store
Concept
Retriever
Describe what a(n) {vocabulary} looks like.
Concept Store
Retrieved concepts and scores
0.30273
0.29639
0.29541
Augmenter
RPN
Crop
Image
Encoder
Concept
Retriever
Augmenter
Offline
Generate
Pseudo-label
RAF training
RALF
CVPR 2024
MLV Lab
Korea University
Retrieval-Augmented visual Features (RAF): augment visual features using verbalized concepts
Text Embeddings of
Text Embeddings of
Text Embeddings of
Ensemble
Backbone
+ RPN
Concept
Retriever
Augmenter
RoI Head
Crop
Image
Encoder
Inference pipeline w/RAF
Experiments
CVPR 2024
MLV Lab
Korea University
Experiments
CVPR 2024
MLV Lab
Korea University
Experiments
CVPR 2024
MLV Lab
Korea University
Conclusion
CVPR 2024
MLV Lab
Korea University