CoLLaVO: Crayon Large Language and Vision mOdel
ACL 2024 Findings
Byung-Kwan Lee, Beom Chan Park, Chae Won Kim, Yong Man Ro
물체 수준 시각 프롬프트를 활용한 효율적인 대형언어시각모델
-Ph.D. Candidate, KAIST EE-
Introduction
C2B: Class to Binary / B2C: Box to Class
We measured
[예제]
C2B: 이 이미지에 사람이 있어?
B2C: [0.2, 0.3, 0.6, 0.4] 이 위치에 어떤 물체가 있어?
Introduction
C2B: Class to Binary / B2C: Box to Class
Introduction
Positive correlation of image understanding and zero-shot vision language
Handbag
Person
Handbag
Person
Handbag
Person
Proposed Method
Proposed Method
Experiment for Object-level Image Understanding
Experiment for Zero-shot Vision Language
Thank you