CSE 414 Section 7
Locking and Cost Estimation
November 7th, 2024
Announcements
Recap: Definitions
Operation - read or write of an element (later: insert or delete)
Transaction - series of operations meant to have the ACID guarantees
Schedule - ordering of the operations of some txns
Serial schedule - schedule where each txn is executed one after another.� No interleaving
Serializable schedule - a schedule that is behaviorally equivalent � to a serial schedule
Recap: Definitions (continued)
Non-conflicting swap - a pair of adjacent operations in a schedule � that can be reversed without affecting the schedule's behavior
Conflicting swap - (any two ops in diff txn), RW, WR, WW
Conflict serializable schedule - a schedule that can be transformed � into a serial schedule via a series of non-conflicting swaps
Recap: ACID
Transactions have 4 main properties:
Locking Overview
Remember - locking is for database internal implementation
Database Lock
Table Lock
Predicate Lock
Tuple Lock
Lock Granularity
SQLite!
Less Concurrency
More Concurrency
All the Locks!
Lock Type | Benefit |
Binary Locks | Simplest lock implementation |
Shared/Exclusive Locks | Greater throughput on reads |
Binary Locks
2PL with Binary Locks
Lock growing phase
Lock shrinking phase
2PL vs. Strict 2PL
Invalid reads
Trickled unlock and transaction end
2PL vs. Strict 2PL
2PL:
Strict 2PL:
Cardinality Estimation Example
Cardinality Estimation (Point Selection)
Assume a uniform distribution.
σcolor = “orange”
X ≅ 1 / V(color)
≅ 1/4
Cardinality Estimation (Union)
Assume conditions are disjoint.
σcolor = “orange” OR color = “green”
X ≅ 1 / V(color) + 1 / V(color)
≅ 1/4 + 1/4 = 1/2
Cardinality Estimation (Intersection)
Assume independence.
σc1 = “red” AND c2 = “yellow”
X ≅ 1 / V(c1) · 1 / V(c2)
≅ 1/2 · 1/2 = 1/4
c1
c2
Cardinality Estimation (Ranges)
Schema: Weather(date, pressure). Assume a uniform distribution.
σpressure < p
σpressure > p
σp < pressure < q
X ≅ (p − min) / (max - min)
X ≅ (max − p) / (max - min)
X ≅ (q − p) / (max - min)
min(pressure)
max(pressure)
p
q
Cardinality Estimation (Equijoin)
Assume uniform distribution and full overlap.
⋈c1 = c2
X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))
≅ 1 / max(V(c1), V(c2))
≅ 1/4
c1
c2
Cardinality Estimation (Equijoin)
Assume uniform distribution and full overlap.
⋈c1 = c2
X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))
≅ 1 / max(V(c1), V(c2))
≅ 1/4 (didn’t change)
c1
c2
Cardinality Estimation (Equijoin)
Assume uniform distribution and full overlap.
⋈c1 = c2
X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))
≅ 1 / max(V(c1), V(c2))
≅ 1/5 (decreased)
c1
c2
Summary
Cost Estimation
Cost Estimation: Factors
B(R) = # blocks for relation R
T(R) = # tuples for relation R
V(R, a) = # of unique values of attribute a in relation R
M = # of available memory pages
Cost Estimation: Nested Loop Join (⋈)
Naive: B(R) + T(R)B(S)
for each tuple t1 in R do
for each tuple t2 in S do
if t1 and t2 join then output (t1,t2)
Cost Estimation: Nested Loop Join (⋈)
Block-at-a-time: B(R) + B(S)B(R)
for each block bR in R:
for each block bS in S:
for each tuple tR in bR:
for each tuple tS in bS:
if tR and tS can join:
output (tR,tS)
Cost Estimation: Nested Loop Join (⋈)
Block-nested-loop: B(R) + (B(R)/(M-1))*B(S) ≅ B(R) + (B(R)/(M)) * B(S)
for each group of M blocks bR in R:
for each block bS in S:
for each tuple tR in bR:
for each tuple tS in bS:
if tR and tS can join:
output (tR,tS)
Cost Estimation: Hash Join (⋈)
R joined with S (assume R is smaller in size)
B(R) + B(S)
Assuming B(R) < M for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S
Cost Estimation: Sort-Merge Join (⋈)
B(R) + B(S)
One pass (look at each table once); Both must be small (B(R) + B(S) < M)
Why would we use this over Hash Join?
Selectivity Formulas