Opaque Filter Query Optimization for Video Analytics
Wenjia He
Advisor: Michael Cafarella
What is a traditional filter query?
2
SELECT name
FROM studentDatabase
WHERE age > 20
Simple filter queries have understandable semantics.
Database optimizations (e.g., B+ tree) are possible.
What is an opaque filter query?
3
Opaque filter queries have unknown semantics.
No traditional database optimizations are possible.
SELECT *
FROM dashcamVideo
WHERE HasCrosswalkUDF(frame) = True
Real-life query engines use SCAN, which is O(n).
During SCAN, HasCrosswalkUDF(frame) usually returns False.
LIMIT clause
4
LIMIT clauses are common in opaque filter queries, driven by users’ requirements.
SELECT *
FROM dashcamVideo
WHERE HasCrosswalkUDF(frame) = True
LIMIT 100
When we hit the LIMIT number, the query can terminate.
Our goal
5
Save time during query processing by choosing frames where UDF is likely to be TRUE.
Our approach: VOODOO Indexing
6
Raw
Data
Cluster
Cluster
Selection
UDF
Execution
Reward
Data 1
Data 2
Data N
. . .
Query Processing Stage
Indexing Stage
. . .
. . .
. . .
Index Structure
Key idea: Visually similar items may yield similar UDF results!
Illustrative example
7
SELECT * FROM dashcamVideo
WHERE HasCrosswalkUDF(frame) = True LIMIT 100
Experimental results
8
MNIST
ImageNet
Voodoo performance is better than competing methods, showing query time improvement of up to 88.2%.
See paper for algorithm and experiment details
9
“A Method for Optimizing Opaque Filter Queries”, SIGMOD 2020
Ongoing Work
10
SELECT * FROM videoCorpus
WHERE object = “Tennis Ball”
LIMIT 10
NAIVE
Query System
Process
randomly
. . .
COMMONSENSE
Query System
Sort based
on index
. . .
Commonsense knowledge: the way to play tennis is to hit a tennis ball by a racket
Standard databases are too slow
11
Real-life query engines use SCAN, query runtime of which is O(n).
n is large
The constant is large
Real-life runtime is large