1 of 21

CSE 414: Section 4

RA + Database Design Theory

October 17th, 2024

2 of 21

Announcements

  • Homework 3 due 10/23 (next Wednesday)

3 of 21

Relational Algebra

4 of 21

RA Operators

Standard:

⋂ - Intersect

⋃ - Union

- Difference

σ - Select

π - Project

⍴ - Rename

Extended:

δ - Duplicate Elim.

ɣ - Group/Agg.

τ - Sorting

Joins:

⨝ - Nat. Join

⟕ - L.O. Join

⟖ - R.O. Join

⟗ - F.O. Join

✕- Cross Product

4

5 of 21

Ɣ Notation

Grouping and aggregation on group:

ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias

Aggregation on the entire table:

ɣcount/sum/max/min(attr) -> alias

5

6 of 21

Format

  • Follows FWGHOS structure

6

7 of 21

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

7

8 of 21

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

πR.b, T.c, T_maxT_max>99R.b, T.c, max(T.a)->T_max(R R.b=T.b T)))

8

9 of 21

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15

);

9

  • We need to correctly exclude rows if they exist in the subquery
  • We cannot use σ (select) to compare rows
  • Solution is to use the (difference) operator!

10 of 21

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15);

10

πR2.a

11 of 21

Difference Operator

SELECT DISTINCT R.a

FROM Table_R AS R

WHERE NOT EXISTS (

SELECT *

FROM Table_S AS S

WHERE S.b = R.a

AND S.c < 15);

Equivalent SQL:

SELECT DISTINCT * FROM Table_R R2

EXCEPT

SELECT R1.a FROM Table_R R1, Table_S S

WHERE S.c < 15 AND R1.a = S.b;

11

πR2.a

12 of 21

Database Design

12

Database Design

Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.

Consider what data needs to be stored and the interrelationship of the data.

Arbitrary Data

Database

There are no good or bad designs, just tradeoffs.

Design Section

13 of 21

Data Relationship Discovery

How do we find relationships between attributes?

Flights dataset:

Did you know that every canceled flight�has an actual_time of 0?

  1. Domain approach
  2. Data mining approach

13

Design Section

14 of 21

Data Relationship Discovery

Domain approach:

Before looking at the data, �think about the domain

Research the domain and realize that �canceled flights have a 0 flight time!

14

Design Section

15 of 21

Data Relationship Discovery

Data mining approach:

Don’t worry about the domain, �just find patterns in the data

Look at the data and notice that every time canceled = 1, then actual_time = 0.

SELECT DISTINCT actual_time

FROM Flights

WHERE canceled = 1;

The only actual_time value is 0

15

Design Section

16 of 21

Data Relationship Discovery

How do we find relationships between attributes?

16

Domain Approach

E/R Diagrams

Data Approach

Functional Dependencies (FDs)

Boyce-Codd Normal Form (BCNF)

Design Section

17 of 21

ER Diagrams

  • Visual graph of entities and relationships

17

Product

Company

Person

employs

buys

makes

price

name

ceo

name

address

address

name

id

Design Section

18 of 21

Relationships

  • one-one: ssn - UW student id
  • one-many: ssn - phone #
  • many-many: store - product
  • is-a: computer - PC and computer - Mac
  • has-a: country - city
    • What country does the city of Cambridge belong to?

18

 

“at most one”

“exactly one”

is a

“subclass”

no constraint

other constraint

Design Section

19 of 21

E/R Demo

Create an ER diagram containing the following:

  • The id, name, gender, country of birth, �and country region (continent) of birth for students and faculty
  • The major of students
  • The salary of faculty
  • The courses that each student takes
  • The department each course is offered in

19

Design Section

20 of 21

E/R Demo – Sample Solution

20

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faculty

The courses that each student takes

The department each course is offered in

Design Section

21 of 21

E/R Demo – Sample Solution

21

The id, name, gender, country of birth, and country region of birth for students and faculty

The major of students

The salary of faculty

The courses that each student takes

The department each course is offered in

Country (name, region)

People (id, country_name, gender, p_name)

Faculty (People.id, salary)

Student (People.id, major)

Course (courseID, dept)

Took (Student.People.id, Course.courseID)

Design Section