1 of 29

CSE 414 Section 7

Locking and Cost Estimation

November 7th, 2024

2 of 29

Announcements

HW 3 grades released
HW 5 posted, due Tuesday 11pm
Gradescope quiz out today, due tomorrow at 11 PM!

3 of 29

Recap: Definitions

Operation - read or write of an element (later: insert or delete)

Transaction - series of operations meant to have the ACID guarantees

Schedule - ordering of the operations of some txns

Serial schedule - schedule where each txn is executed one after another.� No interleaving

Serializable schedule - a schedule that is behaviorally equivalent � to a serial schedule

4 of 29

Recap: Definitions (continued)

Non-conflicting swap - a pair of adjacent operations in a schedule � that can be reversed without affecting the schedule's behavior

Conflicting swap - (any two ops in diff txn), RW, WR, WW

Conflict serializable schedule - a schedule that can be transformed � into a serial schedule via a series of non-conflicting swaps

5 of 29

Recap: ACID

Transactions have 4 main properties:

Atomicity - all or nothing (the entire transaction takes place at once or doesn’t happen at all)
Consistency - preserve database integrity (the database must be consistent before and after the transaction)
Isolation - execute as if they were run alone (multiple transaction occur independently without interference)
Durability - results aren’t lost by a failure (the changes of a successful transaction occurs even if the system failure occurs)

6 of 29

Locking Overview

Remember - locking is for database internal implementation

A way to provide the ACID guarantees
Pessimistic concurrency control

Assumes transaction executions will conflict

Users of a database need only specify �BEGIN TRANSACTION … COMMIT / ROLLBACK

...and the isolation level...

7 of 29

Database Lock

Table Lock

Predicate Lock

Tuple Lock

Lock Granularity

SQLite!

Less Concurrency

More Concurrency

8 of 29

All the Locks!

Lock Type	Benefit
Binary Locks	Simplest lock implementation
Shared/Exclusive Locks	Greater throughput on reads

9 of 29

Binary Locks

10 of 29

2PL with Binary Locks

Lock growing phase

Lock shrinking phase

11 of 29

2PL vs. Strict 2PL

Invalid reads

Trickled unlock and transaction end

12 of 29

2PL vs. Strict 2PL

2PL:

In every transaction, all lock requests must precede all unlock requests
Ensures conflict serializability
Might not be able to recover (Dirty Read: Read on some write that gets rolled back)
Can result in deadlocks

Strict 2PL:

Every lock for each transaction is held until commit or abort
Ensures conflict serializability
Recoverable as each transaction does not affect others until commit/abort
Can result in deadlocks

13 of 29

Cardinality Estimation Example

14 of 29

Cardinality Estimation (Point Selection)

Assume a uniform distribution.

σ_{color = “orange”}

X ≅ 1 / V(color)

≅ 1/4

15 of 29

Cardinality Estimation (Union)

Assume conditions are disjoint.

σ_{color = “orange” OR color = “green”}

X ≅ 1 / V(color) + 1 / V(color)

≅ 1/4 + 1/4 = 1/2

16 of 29

Cardinality Estimation (Intersection)

Assume independence.

σ_{c1 = “red” AND c2 = “yellow”}

X ≅ 1 / V(c1) · 1 / V(c2)

≅ 1/2 · 1/2 = 1/4

c1

c2

17 of 29

Cardinality Estimation (Ranges)

Schema: Weather(date, pressure). Assume a uniform distribution.

σ_{pressure < p}

σ_{pressure > p}

σ_{p < pressure < q}

X ≅ (p − min) / (max - min)

X ≅ (max − p) / (max - min)

X ≅ (q − p) / (max - min)

min(pressure)

max(pressure)

p

q

18 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

⋈_{c1 = c2}

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4

c1

c2

Equijoins are tricky. The first guess we might have for the selectivity factor is X = 1/V(c1) * 1/V(c2). This isn’t quite right though, as 1/V(c1) * 1/V(c2) = 1/3 * 1/4 = 1/12. The correct answer we want is X = 3/12 = 1/4. What went wrong?

Let’s think of what 1/V(c1) * 1/V(c2) really means. From the “Point Selection” slide, we know that 1/V(c1) is the selection factor for choosing one specific color from c1. For example, the selectivity factor for σ (c1 = ‘red’) is 1/V(c1) = 1/3. The same goes for 1/V(c2). For example, σ (c2 = ‘red’) has a selectivity factor of 1/V(c2) = 1/4.

From the “Intersection” slide, we know that multiplying selectivity conditions (when they are independent, and in this case they are) is the same as the AND of them. So σ (c1 = ‘red’ AND c2 = ‘red’) has a selectivity factor of 1/3 * 1/4 = 1/12. If we look at the table, that’s exactly right: there is just one circle that is red for c1 and c2.

We don’t just want one circle here, though. The equijoin is like multiple selections at once: σ (c1 = ‘red’ AND c2 = ‘red’) + σ (c1 = ‘orange’ AND c2 = ‘orange’) + σ (c1 = ‘yellow’ AND c2 = ‘yellow’) = 1/12 + 1/12 + 1/12 = 1/12 * 3 = 3/12.

Is that why we multiply by min(V(c1), V(c2))? Yes! Think about this: the selectivity factor for each σ is always the same: 1/V(c1) * 1/V(c2). The only question is how many we need to add together. When the tables are different sizes, the smaller table is the one that constrains the join. You could add 100 new colors to c2 and it would not increase the cardinality of the join as c1 would not have any of those new colors. The cardinality only increases when V(c1) increases to match V(c2).

In conclusion, we get the formula 1/V(c1) * 1/V(c2) * min(V(c1), V(c2)), which we can simplify to 1/max(V(c1), V(c2)). (When we divide the smaller of V(c1) and V(c2) by both V(c1) and V(c2), we’re left with the larger of the two--max(V(c1), V(c2)).)

19 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

⋈_{c1 = c2}

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4 (didn’t change)

c1

c2

20 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

⋈_{c1 = c2}

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/5 (decreased)

c1

c2

21 of 29

Summary

22 of 29

Cost Estimation

23 of 29

Cost Estimation: Factors

B(R) = # blocks for relation R

T(R) = # tuples for relation R

V(R, a) = # of unique values of attribute a in relation R

M = # of available memory pages

24 of 29

Cost Estimation: Nested Loop Join (⋈)

Naive: B(R) + T(R)B(S)

for each tuple t1 in R do

for each tuple t2 in S do

if t1 and t2 join then output (t1,t2)

25 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-at-a-time: B(R) + B(S)B(R)

for each block bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

26 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-nested-loop: B(R) + (B(R)/(M-1))*B(S) ≅ B(R) + (B(R)/(M)) * B(S)

for each group of M blocks bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

27 of 29

Cost Estimation: Hash Join (⋈)

R joined with S (assume R is smaller in size)

B(R) + B(S)

Assuming B(R) < M for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S

28 of 29

Cost Estimation: Sort-Merge Join (⋈)

B(R) + B(S)

One pass (look at each table once); Both must be small (B(R) + B(S) < M)

Why would we use this over Hash Join?

Tables are sorted on join attributes (No need to hash)
Range join instead of equijoin

29 of 29

Selectivity Formulas

Selectivity Factor (X) → Proportion of total data needed

Assuming uniform distribution of data values on numeric attribute a in table R, if the condition is:

a = c → X ≅1 / V(R, a)
a < c→ X ≅ (c - min(R, a))/ (max(R, a) - min(R, a))
c1 < a < c2→ X ≅ (c2 - c1)/ (max(R, a) - min(R, a))
cond1 and cond2 → X ≅ X1 * X2