ASM in Action: A Fast and Practical Learned Cardinality Estimator
Sangoh Lee (POSTECH), Kyoungmin Kim (EPFL), Wook-Shin Han (POSTECH)
1. Autoregressive Model for per-table statistics estimation
2. Importance Sampling for join statistics estimation
3. Multi-dimensional Statistics Merging for all sub-query cardinality estimation
1. Database Setup & Issuing Query
3. Join Key Distributions and Sampling
4. Multi-dimensional Statistics Merging and Estimation Graph
Architecture of ASM
S
T
A
B
X | Y | A |
… | … | … |
R
A | B |
… | … |
S
B |
… |
T
AR model learned over table R (offline)
X
PR(X|X<3)
x ~ PR(X|X<3)
PR(X)
Y
A
Filter-During-Sample
PR(Y|x)
PR(Y|Y>5, x)
y ~ PR(Y|Y>5, x)
Sample Attributes
PR(A|x, y)
Importance distribution q(A)
Aggregate PR(A|x, y), PS(A)
Sample a ~ q(A)
q(A)
1/3
2
1
2/3
1/7
6/7
Element-wise product/avg
2
1
1/4
3/4
PR(A|x, y)
PS(A)
2
1
Importance distribution q(B|a)
Sample b ~ q(B|a)
Aggregate PS(B|a), PT(B)
Demo Scenarios
2. Hypergraph & Single-table Cardinalities
S
T
1.A
2.B
Subquery | Estimated Cardinality |
| |
| |
Memo table
Statistics from 1, 2
DP
The DBA starts the tour by issuing the query on the selected database, with the baseline estimators and ASM.
The DBA observes intra-/inter- table ordering of each table in the query, with the interactive join query hypergraph.
The DBA can watch how AR models have estimated the selectivity of filters.
The DBA observes how ASM inferred the join key distributions and the importance distributions.
The DBA interacts with the estimation graph, which includes a multi-dimensional statistics merging phase.
The DBA can analyze how the cost/cardinality of each subquery (node) is under-/over-estimated, and the resulting plan.
R
S
T
X
Y
A
A
B
B
A
B
PR(A|X<3 , Y>5)
PS(A)
PS(B|a)
PT(B)
Sample a
A
B
1
2
3
Cost Model
Plan Enumeration
Query Optimizer
SPJ Query
Learned CE for Query Optimization
Query Execution Plan
R
S
T
Learned CE
Query-driven (Supervised)
Data-driven (Unsupervised)
Also check our research paper!
Check our paper!