JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 10

CoLLaVO: Crayon Large Language and Vision mOdel

ACL 2024 Findings

Byung-Kwan Lee, Beom Chan Park, Chae Won Kim, Yong Man Ro

물체 수준 시각 프롬프트를 활용한 효율적인 대형언어시각모델

-Ph.D. Candidate, KAIST EE-

2 of 10

Introduction

C2B: Class to Binary / B2C: Box to Class

We measured

[예제]

C2B: 이 이미지에 사람이 있어?

B2C: [0.2, 0.3, 0.6, 0.4] 이 위치에 어떤 물체가 있어?

Introduction

C2B: Class to Binary / B2C: Box to Class

Introduction

Positive correlation of image understanding and zero-shot vision language

Handbag

Person

Handbag

Person

Handbag

Person

Proposed Method

Proposed Method

Experiment for Object-level Image Understanding

Experiment for Zero-shot Vision Language

Demo

Thank you