CSE 414: Section 4
RA + Database Design Theory
4/20/23
Announcements
Relational Algebra
RA Operators
Standard:
⋂ - Intersect
⋃ - Union
⎼ - Difference
σ - Select
π - Project
⍴ - Rename
Extended:
δ - Duplicate Elim.
ɣ - Group/Agg.
τ - Sorting
Joins:
⨝ - Nat. Join
⟕ - L.O. Join
⟖ - R.O. Join
⟗ - F.O. Join
✕- Cross Product
4
RA Operators
Standard:
⋂ - Intersect
⋃ - Union
⎼ - Difference
σ - Select
π - Project
⍴ - Rename
Extended:
δ - Duplicate Elim.
ɣ - Group/Agg.
τ - Sorting
Joins:
⨝ - Nat. Join
⟕ - L.O. Join
⟖ - R.O. Join
⟗ - F.O. Join
✕- Cross Product
5
Ɣ Notation
Grouping and aggregation on group:
ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias
Aggregation on the entire table:
ɣcount/sum/max/min(attr) -> alias
6
Format
7
Query Plans (Example SQL -> RA)
Select-Join-Project structure
Make this SQL query into RA (remember FWGHOS):
SELECT R.b, T.c, max(T.a) AS T_max
FROM Table_R AS R, Table_T AS T
WHERE R.b = T.b
GROUP BY R.b, T.c
HAVING max(T.a) > 99
8
Query Plans (Example SQL -> RA)
Select-Join-Project structure
Make this SQL query into RA (remember FWGHOS):
SELECT R.b, T.c, max(T.a) AS T_max
FROM Table_R AS R, Table_T AS T
WHERE R.b = T.b
GROUP BY R.b, T.c
HAVING max(T.a) > 99
πR.b, T.c, T_max(σT_max>99(ɣR.b, T.c, max(T.a)->T_max(R ⨝R.b=T.b T)))
9
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15
);
10
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15);
11
πR2.a
Difference Operator
SELECT DISTINCT R.a
FROM Table_R AS R
WHERE NOT EXISTS (
SELECT *
FROM Table_S AS S
WHERE S.b = R.a
AND S.c < 15);
Equivalent SQL:
SELECT DISTINCT * FROM Table_R R2
EXCEPT
SELECT R1.a FROM Table_R R1, Table_S S
WHERE S.c < 15 AND R1.a = S.b;
12
πR2.a
Database Design
Database Design
14
Database Design
Database Design or Logical Design or Relational Schema Design is the process of organizing data into a database model.
Consider what data needs to be stored and the interrelationship of the data.
Arbitrary Data
Database
There are no good or bad designs, just tradeoffs.
Design Section
Data Relationship Discovery
How do we find relationships between attributes?
Flights dataset:
Did you know that every canceled flight�has an actual_time of 0?
15
Design Section
Data Relationship Discovery
Domain approach:
Before looking at the data, �think about the domain
Research the domain and realize that �canceled flights have a 0 flight time!
16
Design Section
Data Relationship Discovery
Data mining approach:
Don’t worry about the domain, �just find patterns in the data
Look at the data and notice that every time canceled = 1, then actual_time = 0.
SELECT DISTINCT actual_time
FROM Flights
WHERE canceled = 1;
The only actual_time value is 0
17
Design Section
Data Relationship Discovery
How do we find relationships between attributes?
18
Domain Approach
E/R Diagrams
Data Approach
Functional Dependencies (FDs)
Boyce-Codd Normal Form (BCNF)
Design Section
ER Diagrams
19
Product
Company
Person
employs
buys
makes
price
name
ceo
name
address
address
name
id
Design Section
Relationships
�
20
“at most one”
“exactly one”
is a
“subclass”
no constraint
other constraint
Design Section
E/R Demo
Create an ER diagram containing the following:
21
Design Section
E/R Demo – Sample Solution
22
The id, name, gender, country of birth, and country region of birth for students and faculty
The major of students
The salary of faculty
The courses that each student takes
The department each course is offered in
Design Section
E/R Demo – Sample Solution
23
The id, name, gender, country of birth, and country region of birth for students and faculty
The major of students
The salary of faculty
The courses that each student takes
The department each course is offered in
Country (name, region)
People (id, country_name, gender, p_name)
Faculty (People.id, salary)
Student (People.id, major)
Course (courseID, dept)
Took (Student.People.id, Course.courseID)
�
Design Section
Functional Dependencies
A functional dependency (FD) for relation R is a formula of the form A → B
where A and B are sets or individual attributes of R.
We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.
DEPENDENCY != CAUSATION
*in other words: A determines B, or A is kinda sorta a key for B
Example
Student(studentID, name, dateOfBirth, phoneNumber)
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Example
Do the following FDs hold?
{studentID} → {name}
{studentID} → {name, dateOfBirth}
{studentID} → {phoneNumber}
There are two tuples that agree on studentID but don’t agree on phoneNumber!
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Trivial FDs
Axiom of Reflexivity
A trivial FD is one where the RHS is a subset of the LHS
That is, every attribute on the RHS is also on the LHS
The two FDs below are trivial:
{name} → {name}
{studentID,name} → {name}
Inference Rules
Closure
Goal:
Observation:
Closures (quicker)
Goal:
Observation:
Example
Given studentID → name, dateOfBirth:
{studentID}+ = {studentID, name, dateOfBirth}
studentID | name | dateOfBirth | phoneNumber |
1234 | Allison | 01-09-1999 | 253-876-1028 |
1235 | Ben | 03-10-2000 | 206-999-1923 |
1236 | Chris | 05-23-1999 | 206-127-1010 |
1234 | Allison | 01-09-1999 | 253-307-1678 |
Keys
We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.
If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
B -> C, D
Question: is {A, B} a superkey, minimal key, neither, or both?
Question: is {B} a superkey, minimal key, neither, or both?
Question: is {A} a superkey, minimal key, neither or both?
superkey
neither
both
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Superkeys and Minimal Keys
R(A, B, C, D)
A -> B
C -> D
Question: is {A, C} a superkey, minimal key, or both?
Question: is {A, B, C, D} a superkey, minimal key, or both?
Question: is {A, C, D} a superkey, minimal key, or both?
both
superkey
superkey
A superkey determines all other attributes in a schema
A minimal key is the smallest set of attributes that determines all other attributes in a schema
By definition, all minimal keys are superkeys, but a superkey is not necessarily a minimal key.
Worksheet Time!