Seesaw:
a system for bootstrapping image searches
Oscar Moll (orm@csail.mit.edu), Manuel Favela, Sam Madden, Vijay Gadepally
Problem: ad-hoc searches on image datasets, without perfect models.
Searching through your own image databases is a basic building block for many downstream tasks.
A common approach to searching images is semantic embeddings, such as CLIP
Pre-trained embeddings are insufficient because accuracy varies widely
Searches can be time consuming or virtually impossible
Example: searching for cars with open doors
Seesaw merges text search with region based feedback
Stage 3:
Region based feedback from user
…
Stage 1:
Preprocessing
CLIP
CLIP
Stage 2:
Querying starts with natural language
“Open car door”
CLIP
Stage 4:
Query vector optimization.
Im0
Im1
Im2
ImN
…
CLIP
…
Region based feedback necessitates region based indexing
User study
Benchmark