1 of 39

CSE 344: Section 7

HW6 & Indexing

2 of 39

HW6

3 of 39

What is Flights App?

  • A client-server application

4 of 39

SQL Injection

  • Vulnerable web applications can expose capabilities to the user that are undesired
    • Dropping tables, inserting values, deleting values, gaining access when it should be rejected, etc.
  • Using PreparedStatement objects will help prevent SQL Injection
    • A PreparedStatement pre-compiles the SQL query such that inputs that would result in SQL Injection when parsed can no longer do so
  • SQL Injection Demo

5 of 39

PreparedStatements

How they work:

  • Your SQL Connection Object parses the String passed to prepareStatement() and pre-compiles it to a parameterized SQL query
  • The “?” in the query are interpreted as parameters that can be inserted via calls on the PreparedStatement Object

String rawQuery = “SELECT * FROM Flights WHERE origin_city = ? AND day_of_month = ?”;

PreparedStatement ps = conn.prepareStatement(rawQuery); // Pre-compiles the query into a PreparedStatement Object

ps.clearParameters(); // Clears parameters from previous use

ps.setString(1, originCity); // Sets the first parameter (the first “?”) to the value of the variable “originCity”

ps.setInt(2, dayOfMonth); // Sets the second parameter (the second “?”) to the value of the variable “dayOfMonth”

ResultSet rs = ps.executeQuery(); // Executes the query and stores the ResultSet in the variable “rs”

6 of 39

ResultSet

  • A ResultSet represents a table of data returned from executing a query
  • It maintains an internal cursor that points to the current row of data, initially that cursor is positioned before the first row so the first call to `next` moves the cursor to the first row
    • Makes it ideal for a while loop:

// ... continued from previous slide

ResultSet rs = ps.executeQuery();

while (rs.next()) { // check if there is another row and move to the next row

String destCity = rs.getString(“dest_city”); // Gets the value of the attribute “dest_city” for the current row

...

} // When `next` finally returns false, we know we’ve seen every row

rs.close(); // Remember to close the ResultSet

7 of 39

HW6 Notes

  • We will be using Azure again: you’ll need your server name, database name, username, and password to put in the dbconn.properties file
    • This file should not be submitted
  • Add test cases of your own, the ones that we provided are not exhaustive and do not test for the full set of capabilities we expect
    • There are private test cases - you can run them on Gradescope and see whether they pass, but won’t see tested commands or expected output
    • Private test case file names are a hint!

8 of 39

HW6 Notes part 2

  • Remember the tables you create is for global data stored for all users
    • Do not store local data for one user which might get deleted after the user quits the session
  • The Flights, Weekdays, Months, and Carriers tables should be read-only - you shouldn’t have to modify these.
  • Reminder to be careful about the order in which you create/delete tables!
  • Make sure to append your uw netid to your custom tables

9 of 39

HW6 Notes part 3

  • The `Flight` class is there for you to use if you like, feel free to modify it, add methods, etc.
    • You can also add internal classes into Query.java as you see fit
      • All your code must be in Query.java or PasswordUtils.java
    • It might be helpful to create a class that represents an Itinerary
      • If it implements the `Comparable` interface then you can use `Collections.sort` to sort it (this would be helpful for sorting one-hop and direct flights with one another)

10 of 39

HW6 Demo

11 of 39

Indexing

12 of 39

13 of 39

Definitions

  • Index - data structure that organizes data records on disk to optimize selections on the search key fields for the index
  • An index contains a collection of data entries, and supports efficient retrieval of all data entries with the given search key value k
  • A data entry is of the format (key = k, value = pointer to records)

14

15

16

14

tuple1

15

tuple2

16

tuple3

Data file sorted on age

Index File

Index on age

14 of 39

Definitions

  • An index is either clustered or unclustered, depending on how the actual data is sorted on disk
  • Clustered index - one that has the same key ordering as what is on disk (one per table)
  • Unclustered index - may exist without any ordering on disk (any number per table)

15 of 39

Definitions

  • An index is either clustered or unclustered, depending on how the actual data is sorted on disk
  • Clustered index - one that has the same key ordering as what is on disk (one per table)
  • Unclustered index - may exist without any ordering on disk (any number per table)

16 of 39

Why Indexing?

Consider you have a deck of 52 cards. I asked you to pick out the 8 of hearts. Calculate the average number of flips required if:

  1. The deck is randomly shuffled.
  2. The deck is separated into four piles, each representing a suit (spades, diamonds, clubs, hearts). Each pile is kept randomly face down.

17 of 39

Problem 1

To find the 8 of hearts from a randomly shuffled cards, on average I would require 26 flips through the deck, about half the deck.

18 of 39

Problem 2

  • There are 4 piles, one for each suit.
  • Disregard the other 3 piles, i.e. clubs, spades, diamonds.
  • In doing so we would require 2 flips, on average, to find the heart pile.
  • We're left with 13 cards in the heart pile to scan, on average we would require 7 flips to find the 8 of hearts.

19 of 39

Answer

Problem 1:

  • Scan 52 Cards (+26)

Average Flips = 26

Problem 2:

  • Pick suit (+2)
  • Scan 13 cards (+7)

Average Flips = 9

20 of 39

Further Thinking…

  • Can you think of an better index?
  • Would your index be clustered or unclustered?

21 of 39

22 of 39

Cost Estimation

23 of 39

Cost Estimation: Factors

B(R) = # blocks for relation R

T(R) = # tuples for relation R

V(R, a) = # of unique values of attribute a in relation R

M = # of available memory pages

24 of 39

Cost Estimation: Nested Loop Join (⋈)

Naive: B(R) + T(R)B(S)

for each tuple t1 in R do

for each tuple t2 in S do

if t1 and t2 join then output (t1,t2)

25 of 39

Cost Estimation: Nested Loop Join (⋈)

Block-at-a-time: B(R) + B(S)B(R)

for each block bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

26 of 39

Cost Estimation: Nested Loop Join (⋈)

Block-nested-loop: B(R) + (B(R)/(M-1))*B(S) ≅ B(R) + (B(R)/(M)) * B(S)

for each group of M blocks bR in R:

for each block bS in S:

for each tuple tR in bR:

for each tuple tS in bS:

if tR and tS can join:

output (tR,tS)

27 of 39

Cost Estimation: Hash Join (⋈)

R joined with S (assume R is smaller in size)

B(R) + B(S)

Assuming B(R) < M for one pass (look at each table once) efficiency, read all of R into a hash table and join with all of S

28 of 39

Cost Estimation: Sort-Merge Join (⋈)

B(R) + B(S)

One pass (look at each table once); Both must be small (B(R) + B(S) < M)

Why would we use this over Hash Join?

  • Tables are sorted on join attributes (No need to hash)
  • Range join instead of equijoin

29 of 39

Selectivity Formulas

  • Selectivity Factor (X) → Proportion of total data needed
    • Assuming uniform distribution of data values on numeric attribute a in table R, if the condition is:
      • a = c → X 1 / V(R, a)
      • a < c→ X (c - min(R, a))/ (max(R, a) - min(R, a))
      • c1 < a < c2→ X (c2 - c1)/ (max(R, a) - min(R, a))
      • cond1 and cond2 → X X1 * X2

30 of 39

Cardinality Estimation Example

31 of 39

Cardinality Estimation (Point Selection)

Assume a uniform distribution.

σcolor = “orange”

X ≅ 1 / V(color)

≅ 1/4

32 of 39

Cardinality Estimation (Union)

Assume conditions are disjoint.

σcolor = “orange” OR color = “green”

X ≅ 1 / V(color) + 1 / V(color)

≅ 1/4 + 1/4 = 1/2

33 of 39

Cardinality Estimation (Intersection)

Assume independence.

σc1 = “red” AND c2 = “yellow”

X ≅ 1 / V(c1) · 1 / V(c2)

≅ 1/2 · 1/2 = 1/4

c1

c2

34 of 39

Cardinality Estimation (Ranges)

Schema: Weather(date, pressure). Assume a uniform distribution.

σpressure < p

σpressure > p

σp < pressure < q

X ≅ (p min) / (max - min)

X ≅ (max − p) / (max - min)

X ≅ (q p) / (max - min)

min(pressure)

max(pressure)

p

q

35 of 39

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ (1 / (V(c1) · V(c2))) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4

c1

c2

36 of 39

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ 1 / (V(c1) · V(c2)) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/4 (didn’t change)

c1

c2

37 of 39

Cardinality Estimation (Equijoin)

Assume uniform distribution and full overlap.

c1 = c2

X ≅ 1 / (V(c1) · V(c2)) · min(V(c1), V(c2))

≅ 1 / max(V(c1), V(c2))

≅ 1/5 (decreased)

c1

c2

38 of 39

Summary

39 of 39

Worksheet