1 of 29

CSE 414 Section 7

Locking and Cost Estimation

November 7th, 2024

2 of 29

Announcements

  • HW 3 grades released
  • HW 5 posted, due Tuesday 11pm
  • Gradescope quiz out today, due tomorrow at 11 PM!

3 of 29

Recap: Definitions

Operation - read or write of an element (later: insert or delete)

Transaction - series of operations meant to have the ACID guarantees

Schedule - ordering of the operations of some txns

Serial schedule - schedule where each txn is executed one after another.� No interleaving

Serializable schedule - a schedule that is behaviorally equivalent � to a serial schedule

4 of 29

Recap: Definitions (continued)

Non-conflicting swap - a pair of adjacent operations in a schedule � that can be reversed without affecting the schedule's behavior

Conflicting swap - (any two ops in diff txn), RW, WR, WW

Conflict serializable schedule - a schedule that can be transformed � into a serial schedule via a series of non-conflicting swaps

5 of 29

Recap: ACID

Transactions have 4 main properties:

  • Atomicity - all or nothing (the entire transaction takes place at once or doesn’t happen at all)
  • Consistency - preserve database integrity (the database must be consistent before and after the transaction)
  • Isolation - execute as if they were run alone (multiple transaction occur independently without interference)
  • Durability - results aren’t lost by a failure (the changes of a successful transaction occurs even if the system failure occurs)

6 of 29

Locking Overview

Remember - locking is for database internal implementation

  • A way to provide the ACID guarantees
  • Pessimistic concurrency control
    • Assumes transaction executions will conflict
  • Users of a database need only specify �BEGIN TRANSACTION … COMMIT / ROLLBACK
    • ...and the isolation level...

7 of 29

Database Lock

Table Lock

Predicate Lock

Tuple Lock

Lock Granularity

SQLite!

Less Concurrency

More Concurrency

8 of 29

All the Locks!

Lock Type

Benefit

Binary Locks

Simplest lock implementation

Shared/Exclusive Locks

Greater throughput on reads

9 of 29

Binary Locks

10 of 29

2PL with Binary Locks

Lock growing phase

Lock shrinking phase

11 of 29

2PL vs. Strict 2PL

Invalid reads

Trickled unlock and transaction end

12 of 29

2PL vs. Strict 2PL

2PL:

  • In every transaction, all lock requests must precede all unlock requests
  • Ensures conflict serializability
  • Might not be able to recover (Dirty Read: Read on some write that gets rolled back)
  • Can result in deadlocks

Strict 2PL:

  • Every lock for each transaction is held until commit or abort
  • Ensures conflict serializability
  • Recoverable as each transaction does not affect others until commit/abort
  • Can result in deadlocks

13 of 29

Cardinality Estimation Example

14 of 29

Cardinality Estimation (Point Selection)

Assume a uniform distribution.

σcolor = “orange”

X ≅ 1 / V(color)

≅ 1/4

15 of 29

Cardinality Estimation (Union)

Assume conditions are disjoint.

σcolor = “orange” OR color = “green”

X ≅ 1 / V(color) + 1 / V(color)

≅ 1/4 + 1/4 = 1/2

16 of 29

Cardinality Estimation (Intersection)

Assume independence.

σc1 = “red” AND c2 = “yellow”

X ≅ 1 / V(c1) · 1 / V(c2)

≅ 1/2 · 1/2 = 1/4

c1

c2

17 of 29

Cardinality Estimation (Ranges)

Schema: Weather(date, pressure). Assume a uniform distribution.

σpressure < p

σpressure > p

σp < pressure < q

X ≅ (p min) / (max - min)

X ≅ (max − p) / (max - min)

X ≅ (q p) / (max - min)

min(pressure)

max(pressure)

p

q

18 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4

c1

c2

19 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4 (didn’t change)

c1

c2

20 of 29

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/5 (decreased)

c1

c2

21 of 29

Summary

22 of 29

Cost Estimation

23 of 29

Cost Estimation: Factors

B(R) = # blocks for relation R

T(R) = # tuples for relation R

V(R, a) = # of unique values of attribute a in relation R

M = # of available memory pages

24 of 29

Cost Estimation: Nested Loop Join (⋈)

Naive: B(R) + T(R)B(S)

for each tuple t1 in R do

for each tuple t2 in S do

if t1 and t2 join then output (t1,t2)

25 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-at-a-time: B(R) + B(S)B(R)

for each block bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

26 of 29

Cost Estimation: Nested Loop Join (⋈)

Block-nested-loop: B(R) + (B(R)/(M-1))*B(S) ≅ B(R) + (B(R)/(M)) * B(S)

for each group of M blocks bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

27 of 29

Cost Estimation: Hash Join (⋈)

R joined with S (assume R is smaller in size)

B(R) + B(S)

Assuming B(R) < M for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S

28 of 29

Cost Estimation: Sort-Merge Join (⋈)

B(R) + B(S)

One pass (look at each table once); Both must be small (B(R) + B(S) < M)

Why would we use this over Hash Join?

  • Tables are sorted on join attributes (No need to hash)
  • Range join instead of equijoin

29 of 29

Selectivity Formulas

  • Selectivity Factor (X) → Proportion of total data needed
    • Assuming uniform distribution of data values on numeric attribute a in table R, if the condition is:
      • a = c → X 1 / V(R, a)
      • a < c→ X (c - min(R, a))/ (max(R, a) - min(R, a))
      • c1 < a < c2→ X (c2 - c1)/ (max(R, a) - min(R, a))
      • cond1 and cond2 → X X1 * X2