1 of 38

CSE 344: Section 7

Cardinality Estimation, Join Algorithms, Flight App

Nov 8th, 2025

2 of 38

Announcements

Announcements

  • Homework 5:
    • Deadline tonight at 10pm!

  • Project M1 is out!
    • Due Thursday Nov 13th, at 10pm

  • Questions?

3 of 38

Cardinality Estimation Example

4 of 38

Cardinality Estimation (Point Selection)

Assume a uniform distribution.

σcolor = “orange”

X ≅ 1 / V(color)

≅ 1/4

5 of 38

Cardinality Estimation (Union)

Assume conditions are disjoint.

σcolor = “orange” OR color = “green”

X ≅ 1 / V(color) + 1 / V(color)

≅ 1/4 + 1/4 = 1/2

6 of 38

Cardinality Estimation (Intersection)

Assume independence.

σc1 = “red” AND c2 = “yellow”

X ≅ 1 / V(c1) · 1 / V(c2)

≅ 1/2 · 1/2 = 1/4

c1

c2

7 of 38

Cardinality Estimation (Ranges)

Schema: Weather(date, pressure). Assume a uniform distribution.

σpressure < p

σpressure > p

σp < pressure < q

X ≅ (p min) / (max - min + 1)

X ≅ (max − p) / (max - min + 1)

X ≅ (q p) / (max - min + 1)

min(pressure)

max(pressure)

p

q

8 of 38

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4

c1

c2

9 of 38

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4 (didn’t change)

c1

c2

10 of 38

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/5 (decreased)

c1

c2

11 of 38

Summary

+1

12 of 38

Join Algorithms

13 of 38

Definitions

B(R) = # blocks for relation R

T(R) = # tuples for relation R

M = # of blocks that can fit in memory

14 of 38

Nested Loop Join (⋈) - Naive Version

Naive: B(R) + T(R)B(S)

for each tuple t1 in R do

for each tuple t2 in S do

if t1 and t2 join then output (t1,t2)

15 of 38

This version of the algo reserves an extra block for the joined output; lecture doesn't

16 of 38

Nested Loop Join (⋈) - Block-at-a-time Version

Block-at-a-time: B(R) + B(S)B(R)

for each block bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

17 of 38

Nested Loop Join (⋈) - Block Version

Block-nested-loop: B(R) + B(R)B(S)/(M-2)

for each group of M-2 blocks bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

This version of the algo reserves an extra block for the joined output; lecture doesn't.��This version does reserve one block for S, however.

18 of 38

Hash Join (⋈)

R joined with S (assume R is smaller in size)

B(R) + B(S)

Assuming B(R) < M-2 for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S

19 of 38

Hash Join (⋈)

for tuple x in R:

insert(x.r, x)

for tuple y in S:

x = find(y.s)

output(x, y)

20 of 38

Hash Join (⋈)

for tuple x in R:

insert(x.r, x)

for tuple y in S:

x = find(y.s)

output(x, y)

Which table is R?

21 of 38

Hash Join (⋈)

for tuple x in R:

insert(x.r, x)

for tuple y in S:

x = find(y.s)

output(x, y)

Which table is R?

In this case, R is table with PK. Why?

22 of 38

Hash Join (⋈)

for tuple x in R:

insert(x.r, x)

for tuple y in S:

for x in find(y.s)

output(x, y)

What if |FK table| << |PK table| ?

23 of 38

Hash Join (⋈)

for tuple x in R:

insert(x.r, x)

for tuple y in S:

for x in find(y.s)

output(x, y)

What if |FK table| << |PK table| ?

R is better as FK table

24 of 38

Sort-Merge Join (⋈)

B(R) + B(S)

One pass (look at each table once); Both must be small (B(R) + B(S) < M)

Why would we use this over Hash Join?

  • Tables are sorted on join attributes (No need to hash)
  • Range join instead of equijoin

25 of 38

Sort-Merge Join (⋈)

sort(R); sort(S); // Better if sorted already!

x = R.first()

y = S.first()

while y != NULL do:

case:

x.r < y.s: x.next();

x.r = y.s: output(x, y); y.next();

x.r > y.s: y.next();

Note: R is table with PK. Why?

26 of 38

JDBC and Prepared Statements

27 of 38

Java: JDBC

  • API library that allows Java programs to access database management systems

  • With JDBC, we can write Java code to:
  • Connect to a database
  • Send queries and update statements to the database
  • Retrieve and process the results received from the database

28 of 38

ResultSet

  • A ResultSet represents data returned from executing a query
  • It maintains an internal cursor that points to the current row of data, initially that cursor is positioned before the first row so the first call to `next` moves the cursor to the first row
    • Makes it ideal for a while loop:

// ... continued from previous slide

ResultSet rs = ps.executeQuery();

while (rs.next()) { // check if there is another row and move to the next row

String destCity = rs.getString("dest_city"); // Gets the value of the attribute "dest_city" for the current row

...

} // When `next` finally returns false, we know we’ve seen every row

rs.close(); // Remember to close the ResultSet

29 of 38

Statements and PreparedStatements

How they work:

  • Your SQL Connection Object sends a String containing SQL to the DBMS
  • DBMS executes your SQL and returns results via ResultSet
    • Recall that "executing your SQL" has multiple steps!
    • Compilation to RA, generating equivalent trees, cardinality estimation, choosing the optimal tree, and finally executing the generated machine code!
  • If you are executing the "same shaped" query multiple times, only the last step differs!

String q1 = "SELECT * FROM Flights WHERE origin_city = 'SEA' AND day_of_month = 1";

String q2 = "SELECT * FROM Flights WHERE origin_city = 'PDX' AND day_of_month = 1";

String q3 = "SELECT * FROM Flights WHERE origin_city = 'SEA' AND day_of_month = 15";

30 of 38

PreparedStatements

  • PreparedStatements tell RDMS to prepare and cache an optimized tree

  • Typically prepare the statement once, at program start
    • If the query needs to be parameterized, use "?" to insert values at execution time
  • Execute the statement multiple times, whenever your program needs it
    • Insert parameters (if any) and execute the previously-instantiated PreparedStatement Object

String rawQuery = "SELECT * FROM Flights WHERE origin_city = ? AND day_of_month = ?";

PreparedStatement ps = conn.prepareStatement(rawQuery); // Pre-compiles the query into a PreparedStatement Object

ps.clearParameters(); // Clears parameters from previous use

ps.setString(1, originCity); // Sets the first parameter (the first "?") to the value of the variable "originCity"

ps.setInt(2, dayOfMonth); // Sets the second parameter (the second "?") to the value of the variable "dayOfMonth"

ResultSet rs = ps.executeQuery(); // Executes the query and stores the ResultSet in the variable "rs"

31 of 38

SQL Injections

32 of 38

SQL Injection

  • Vulnerable web applications can expose capabilities to the user that are undesired
    • Dropping tables, inserting values, deleting values, gaining access when it should be rejected, etc.
  • Using PreparedStatement objects will help prevent SQL Injection
    • A PreparedStatement pre-compiles the SQL query such that inputs that would result in SQL Injection when parsed can no longer do so
  • SQL Injection Demo

33 of 38

Hashing

  • Pass the plaintext password through a cryptographic hash function
  • Store the hashed password in the database instead
  • Problem: You are guaranteed to have users with bad passwords :(
    • Hashing their passwords won’t help

34 of 38

Salting

  • Adds a small amount of randomness to users’ passwords
    • Makes identical passwords not have identical hashes

35 of 38

HW6 Hashing/Salting

  • Don’t need to store salt in a separate column
    • Note: If your salts are a constant length, you don’t need the “:” indicator here

36 of 38

Flight App Notes

  • Add test cases of your own, the ones that we provided are not exhaustive and do not test for the full set of capabilities we expect
    • There are private test cases - you can run them on Gradescope and see whether they pass, but won’t see tested commands or expected output
    • Private test case file names are a hint!

37 of 38

Demo

38 of 38

Worksheet