CSE 414: Section 4
RA + Database Design Theory
October 17th, 2024
Announcements
Relational Algebra
RA Operators
Standard:
⋂ - Intersect
⋃ - Union
⎼ - Difference
σ - Select
π - Project
⍴ - Rename
Extended:
δ - Duplicate Elim.
ɣ - Group/Agg.
τ - Sorting
Joins:
⨝ - Nat. Join
⟕ - L.O. Join
⟖ - R.O. Join
⟗ - F.O. Join
✕- Cross Product
4
Ɣ Notation
Grouping and aggregation on group:
ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias
Aggregation on the entire table:
ɣcount/sum/max/min(attr) -> alias
5
Format
6
Query Plans (Example SQL -> RA)
Select-Join-Project structure
Make this SQL query into RA (remember FWGHOS):
SELECT R.b, T.c, max(T.a) AS T_max
FROM Table_R AS R, Table_T AS T
WHERE R.b = T.b
GROUP BY R.b, T.c
HAVING max(T.a) > 99
7
Query Plans (Example SQL -> RA)
Select-Join-Project structure
Make this SQL query into RA (remember FWGHOS):
SELECT R.b, T.c, max(T.a) AS T_max
FROM Table_R AS R, Table_T AS T
WHERE R.b = T.b
GROUP BY R.b, T.c
HAVING max(T.a) > 99
πR.b, T.c, T_max(σT_max>99(ɣR.b, T.c, max(T.a)->T_max(R ⨝R.b=T.b T)))
8
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15
);
9
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15);
10
πR2.a
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15);
Equivalent SQL:
SELECT DISTINCT * FROM Table_R R2
EXCEPT
SELECT R1.a FROM Table_R R1, Table_S S
WHERE S.c < 15 AND R1.a = S.b;
11
πR2.a
Database Design
12
Database Design
Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.
Consider what data needs to be stored and the interrelationship of the data.
Arbitrary Data
Database
There are no good or bad designs, just tradeoffs.
Design Section
Data Relationship Discovery
How do we find relationships between attributes?
Flights dataset:
Did you know that every canceled flight�has an actual_time of 0?
13
Design Section
Data Relationship Discovery
Domain approach:
Before looking at the data, �think about the domain
Research the domain and realize that �canceled flights have a 0 flight time!
14
Design Section
Data Relationship Discovery
Data mining approach:
Don’t worry about the domain, �just find patterns in the data
Look at the data and notice that every time canceled = 1, then actual_time = 0.
SELECT DISTINCT actual_time
FROM Flights
WHERE canceled = 1;
The only actual_time value is 0
15
Design Section
Data Relationship Discovery
How do we find relationships between attributes?
16
Domain Approach
E/R Diagrams
Data Approach
Functional Dependencies (FDs)
Boyce-Codd Normal Form (BCNF)
Design Section
ER Diagrams
17
Product
Company
Person
employs
buys
makes
price
name
ceo
name
address
address
name
id
Design Section
Relationships
�
18
“at most one”
“exactly one”
is a
“subclass”
no constraint
other constraint
Design Section
E/R Demo
Create an ER diagram containing the following:
19
Design Section
E/R Demo – Sample Solution
20
The id, name, gender, country of birth, and country region of birth for students and faculty
The major of students
The salary of faculty
The courses that each student takes
The department each course is offered in
Design Section
E/R Demo – Sample Solution
21
The id, name, gender, country of birth, and country region of birth for students and faculty
The major of students
The salary of faculty
The courses that each student takes
The department each course is offered in
Country (name, region)
People (id, country_name, gender, p_name)
Faculty (People.id, salary)
Student (People.id, major)
Course (courseID, dept)
Took (Student.People.id, Course.courseID)
�
Design Section