1 of 11

Opaque Filter Query Optimization for Video Analytics

Wenjia He

Advisor: Michael Cafarella

2 of 11

What is a traditional filter query?

2

SELECT name

FROM studentDatabase

WHERE age > 20

Simple filter queries have understandable semantics.

Database optimizations (e.g., B+ tree) are possible.

3 of 11

What is an opaque filter query?

3

Opaque filter queries have unknown semantics.

No traditional database optimizations are possible.

SELECT *

FROM dashcamVideo

WHERE HasCrosswalkUDF(frame) = True

Real-life query engines use SCAN, which is O(n).

During SCAN, HasCrosswalkUDF(frame) usually returns False.

4 of 11

LIMIT clause

4

LIMIT clauses are common in opaque filter queries, driven by users’ requirements.

SELECT *

FROM dashcamVideo

WHERE HasCrosswalkUDF(frame) = True

LIMIT 100

When we hit the LIMIT number, the query can terminate.

5 of 11

Our goal

5

Save time during query processing by choosing frames where UDF is likely to be TRUE.

6 of 11

Our approach: VOODOO Indexing

6

Raw

Data

Cluster

Cluster

Selection

UDF

Execution

Reward

Data 1

Data 2

Data N

. . .

Query Processing Stage

Indexing Stage

. . .

. . .

. . .

Index Structure

Key idea: Visually similar items may yield similar UDF results!

7 of 11

Illustrative example

7

SELECT * FROM dashcamVideo

WHERE HasCrosswalkUDF(frame) = True LIMIT 100

8 of 11

Experimental results

8

MNIST

ImageNet

Voodoo performance is better than competing methods, showing query time improvement of up to 88.2%.

9 of 11

See paper for algorithm and experiment details

9

“A Method for Optimizing Opaque Filter Queries”, SIGMOD 2020

10 of 11

Ongoing Work

10

SELECT * FROM videoCorpus

WHERE object = “Tennis Ball”

LIMIT 10

NAIVE

Query System

Process

randomly

. . .

COMMONSENSE

Query System

Sort based

on index

. . .

Commonsense knowledge: the way to play tennis is to hit a tennis ball by a racket

11 of 11

Standard databases are too slow

11

Real-life query engines use SCAN, query runtime of which is O(n).

  • UDFs usually contain complex neural network models for challenging tasks.
  • UDF runtime can reach 2 seconds per record.
  • Large databases are easy to collect from online collections or video applications.
  • A 30 fps camera can obtain more than 2.5M frames a day.
  • The constant is dominated by UDF runtime.

n is large

  • n = # of UDF invocations

The constant is large

Real-life runtime is large