JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 5

Seesaw:

a system for bootstrapping image searches

Oscar Moll (orm@csail.mit.edu), Manuel Favela, Sam Madden, Vijay Gadepally

2 of 5

Problem: ad-hoc searches on image datasets, without perfect models.

Searching through your own image databases is a basic building block for many downstream tasks.

A common approach to searching images is semantic embeddings, such as CLIP

Pre-trained embeddings are insufficient because accuracy varies widely

Searches can be time consuming or virtually impossible

Example: searching for cars with open doors

Seesaw merges text search with region based feedback

Stage 3:

Region based feedback from user

…

Stage 1:

Preprocessing

CLIP

Stage 2:

Querying starts with natural language

“Open car door”

CLIP

Stage 4:

Query vector optimization.

Im₀

Im₁

Im₂

Im_N

…

CLIP

…

Region based feedback necessitates region based indexing

User study

Task: to find 10 examples (multiple queries)
Users consistently completed task faster using SeeSaw than CLIP alone, sometimes by substantial margins.

Benchmark