1 of 10

CoLLaVO: Crayon Large Language and Vision mOdel

ACL 2024 Findings

Byung-Kwan Lee, Beom Chan Park, Chae Won Kim, Yong Man Ro

물체 수준 시각 프롬프트를 활용한 효율적인 대형언어시각모델

-Ph.D. Candidate, KAIST EE-

2 of 10

 

Introduction

C2B: Class to Binary / B2C: Box to Class

We measured

[예제]

C2B: 이 이미지에 사람이 있어?

B2C: [0.2, 0.3, 0.6, 0.4] 이 위치에 어떤 물체가 있어?

3 of 10

 

Introduction

C2B: Class to Binary / B2C: Box to Class

4 of 10

 

Introduction

Positive correlation of image understanding and zero-shot vision language

Handbag

Person

Handbag

Person

Handbag

Person

5 of 10

 

Proposed Method

6 of 10

 

Proposed Method

7 of 10

 

Experiment for Object-level Image Understanding

8 of 10

 

Experiment for Zero-shot Vision Language

9 of 10

 

Demo

10 of 10

Thank you