1 of 29

CSE 344: Section 6

Cost Estimation

November 4th, 2021

2 of 29

Administrivia

  • HW5 Milestone 1 due Monday, Nov. 8th
  • HW5 Milestone 2 due the next Monday, Nov. 15th
    • Bulk of the assignment
    • Much longer and much harder, so please get started early!

3 of 29

Cost Estimation: Factors

B(R) = # blocks for relation R

T(R) = # tuples for relation R

V(R, a) = # of unique values of attribute a in relation R

M = # of available memory pages

4 of 29

Cost Estimation: Selection (σ)

Table scan = B(R)

Point Selection:

Index Based Selection (clustered) = B(R)/V(R, a)

Index Based Selection (unclustered) = T(R)/V(R, a)

5 of 29

Cost Estimation Disk I/O Formulas

  • Nested-Loop Joins
    • Block-at-a-time → B(R)+B(R)*B(S)
    • Nested-block-loop/Block-nested-loop → B(R)+(B(R)/M)*B(S)
      • More memory but faster runtime
  • Hash Join → B(R)+B(S)
    • Smaller table must fit in memory
  • Sort-Merge Join → B(R)+B(S)
    • Both tables must fit in memory simultaneously
  • Clustered Index Equijoin → B(R)+T(R)(X*B(S)) = B(R)+T(R)(B(S)/V(S, a))
  • Unclustered Index Equijoin → B(R)+T(R)(X*T(S)) = B(R)+T(R)(T(S)/V(S, a))

6 of 29

Cost Estimation: Nested Loop Join (⋈)

Naive: B(R) + T(R)B(S)

for each tuple t1 in R do

for each tuple t2 in S do

if t1 and t2 join then output (t1,t2)

7 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-at-a-time: B(R) + B(S)B(R)

for each block bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

8 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-nested-loop: B(R) + (B(R)/(M-1))*B(S) ≅ B(R) + (B(R)/(M)) * B(S)

for each group of M blocks bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

9 of 29

Cost Estimation: Hash Join (⋈)

R joined with S (assume R is smaller in size)

B(R) + B(S)

Assuming B(R) < M for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S

10 of 29

Cost Estimation: Sort-Merge Join (⋈)

B(R) + B(S)

One pass (look at each table once); Both must be small (B(R) + B(S) < M)

Why would we use this over Hash Join?

  • Tables are sorted on join attributes (No need to hash)
  • Range join instead of equijoin

11 of 29

Selectivity Formulas

  • Selectivity Factor (X) → Proportion of total data needed
    • Assuming uniform distribution of data values on numeric attribute a in table R, if the condition is:
      • a = c → X 1 / V(R, a)
      • a < c→ X (c - min(R, a))/ (max(R, a) - min(R, a))
      • c1 < a < c2→ X (c2 - c1)/ (max(R, a) - min(R, a))
      • cond1 and cond2 → X X1 * X2

12 of 29

Cardinality Estimation Example

13 of 29

Cardinality Estimation

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

SELECT sname

FROM Supply x, Supplier y

WHERE x.sid=y.sid AND x.pno=2 AND y.scity=’Seattle’ AND y.sstate=’WA’;

Supply(sid, pno, quantity)

Supplier(sid, sname, scity, sstate)

sid=sid

14 of 29

Cardinality Estimation

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

sid=sid

15 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T(Supply) * 1 / V(Supply, pno)

= 4

T(Supplier) * 1 / V(Supplier, scity) * 1 / V(Supplier, sstate)

= 5

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

16 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 5

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

17 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 50

But wait a second…. Seattle is in Washington!

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

18 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 50

Min(T1, T2)

= 4

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

19 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 50

T3 = 4

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

20 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 50

T3 = 4

No filtering at this step

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

21 of 29

Cardinality Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

T1 = 4

T2 = 50

T3 = 4

Total = 4

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

22 of 29

Cost Estimation Example

23 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

(Hash Join)

24 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

(Hash Join)

(On the fly)

(Sequential

Scan)

(Sequential

Scan)

25 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

(Hash Join)

(On the fly)

(Sequential

Scan)

(Sequential

Scan)

(On the fly)

26 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

(Hash Join)

(On the fly)

(On the fly)

Cost = B(Supply) = 100

Cost = B(Supplier) = 100

27 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

Cost = 0

Cost = 0

C1 = 100

C2 = 100

28 of 29

Cost Estimation

Supply Statistics:

  • T(Supply) = 10000
  • B(Supply) = 100
  • V(Supply, pno) = 2500

Supplier Statistics:

  • T(Supplier) = 1000
  • B(Supplier) = 100
  • V(Supplier, scity) = 20
  • V(Supplier, sstate) = 10

Supply

σpno=2

Supplier

σscity=’Seattle’ Λ sstate=’WA’

πsname

sid=sid

Total Cost = 100 + 100 = 200 I/Os

C1 = 100

C2 = 100

C3 = 0

C4 = 0

29 of 29

Pipelined Execution