1 of 1

ASM in Action: A Fast and Practical Learned Cardinality Estimator

Sangoh Lee (POSTECH), Kyoungmin Kim (EPFL), Wook-Shin Han (POSTECH)

1. Autoregressive Model for per-table statistics estimation

2. Importance Sampling for join statistics estimation

3. Multi-dimensional Statistics Merging for all sub-query cardinality estimation

1. Database Setup & Issuing Query

3. Join Key Distributions and Sampling

4. Multi-dimensional Statistics Merging and Estimation Graph

Architecture of ASM

X	Y	A
…	…	…

A	B
…	…

B
…

AR model learned over table R (offline)

P_R(X|X<3)

x ~ P_R(X|X<3)

P_R(X)

Filter-During-Sample

P_R(Y|x)

P_R(Y|Y>5, x)

y ~ P_R(Y|Y>5, x)

Sample Attributes

P_R(A|x, y)

Importance distribution q(A)

Aggregate P_R(A|x, y), P_S(A)

Sample a ~ q(A)

q(A)

1/3

2/3

1/7

6/7

Element-wise product/avg

1/4

3/4

P_R(A|x, y)

P_S(A)

Importance distribution q(B|a)

Sample b ~ q(B|a)

Aggregate P_S(B|a), P_T(B)

Demo Scenarios

2. Hypergraph & Single-table Cardinalities

1.A

2.B

Subquery	Estimated Cardinality

Memo table

Using the statistics from the first (selectivity) and second phase (P(join key)), ASM estimates the cardinalities of all subqueries.
To estimate the cardinalities of thousands of subqueries, ASM exploits dynamic programming (DP).

Statistics from 1, 2

The DBA starts the tour by issuing the query on the selected database, with the baseline estimators and ASM.

The DBA observes intra-/inter- table ordering of each table in the query, with the interactive join query hypergraph.

The DBA can watch how AR models have estimated the selectivity of filters.

The DBA observes how ASM inferred the join key distributions and the importance distributions.

The DBA interacts with the estimation graph, which includes a multi-dimensional statistics merging phase.

The DBA can analyze how the cost/cardinality of each subquery (node) is under-/over-estimated, and the resulting plan.

P_R(A|X<3 , Y>5)

P_S(A)

P_S(B|a)

P_T(B)

Sample a

Cost Model

Plan Enumeration

Query Optimizer

SPJ Query

Learned CE for Query Optimization

Query Execution Plan

Learned CE

Query-driven (Supervised)

Data-driven (Unsupervised)

Large training data collection overhead
Not scalable to databases of large number of tables

Small query coverage, i.e., rare support for LIKE/In/Disjunctive filters (FactorJoin, NeuroCard)
Independence assumption between join keys (FactorJoin)
Estimate sub-queries independently (NeuroCard) or assume independence while utilizing dynamic programming (FactorJoin)

Also check our research paper!

Check our paper!