Updates: DDL, DML
February 7, 2024
1
Data 101/Info 258, Spring 2024 @ UC Berkeley
Aditya Parameswaran https://data101.org/sp24
LECTURE 06
Onto bigger and better data engineering!
2
SQL queries
(Select from where, joins,�window functions)
Relational Algebra Introduction
Performance tuning queries
So far
Next time
This time:
SQL command categories, broadly
3
Image: source
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
DML: INSERT, DELETE, UPDATE
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
4
Lecture 06, Data 101/Info 258 Spring 2024
SQL command categories, broadly
5
Image: source
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
so far
today
DML: Data Modification Language
We also need to be able to modify the data in relations!
A modification command doesn’t return a result the same way queries we’ve seen so far do, but it changes the relational instance (i.e., the stored tuples).
Three kinds of modifications:
6
INSERT
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
7
Titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation INSERT
INSERT specifying attributes
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');
8
Titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation INSERT
INSERT a subquery’s result
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');
INSERT INTO Titles (title_id) (� SELECT DISTINCT C.title_id� FROM Crew C� WHERE NOT EXISTS (SELECT * FROM Titles T WHERE C.title_id = T.title_id));
9
Titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation INSERT
DELETE
DELETE FROM <Relation> WHERE <condition>;
DELETE FROM Titles WHERE premiered >= 2023;
DELETE FROM Titles;
10
Titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation DELETE
UPDATE
UPDATE Titles� SET original_title = primary_title� WHERE original_title IS NULL;
UPDATE <Relation>� SET <list of attribute assignments>� WHERE <condition>;
11
Titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation INSERT
DDL: CREATE, DROP, ALTER
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
12
Lecture 06, Data 101/Info 258 Spring 2024
DDL: Data Definition Language
How do we modify the schema of a table?
A definition command also doesn’t return a result the same way queries we’ve seen so far do, but it sets/changes the relational schema (i.e., attributes) + potentially the relational instance.
Three kinds of definitions:
13
CREATE
From before:
CREATE TABLE <relation name> AS (� <subquery>�);
You can also create a new table with an entirely new schema (and no initial tuples):
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER
);
14
syntax: attributeName ATTRIBUTETYPE
Documentation CREATE TABLE
CREATE
From before:
CREATE TABLE <relation name> AS (� <subquery>�);
You can also create a new table with an entirely new schema (and no initial tuples):
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER
);
Common types:
INTEGER �REAL
CHAR(n) fixed length string of n characters�VARCHAR(n) variable length string of up � to n characters�TEXT variable length string
DATE ‘yyyy-mm-dd’�TIME ‘hr:min:sec’�TIMESTAMP ‘yyyy-mm-dd hr:min:sec’
BOOLEAN
15
syntax: attributeName ATTRIBUTETYPE
Documentation CREATE TABLE
DROP
DROP TABLE <relation name>;
DROP TABLE IF EXISTS <relation name>;
16
Documentation DROP TABLE
ALTER TABLE
Redefine the schema.
ALTER TABLE Zips� ADD area REAL DEFAULT ‘1.0’,� ADD timezone VARCHAR(5);
ALTER TABLE Zips � DROP area, � DROP timezone;
17
Documentation ALTER TABLE
Demo
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
18
Lecture 06, Data 101/Info 258 Spring 2024
UPDATE/DELETE: Under the hood
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
19
Lecture 06, Data 101/Info 258 Spring 2024
A Tricky Issue of Semantics
Issue with UPDATE (and DELETE):
Suppose we have a relation: Employee (name, salary, address, title)
UPDATE Employee� SET salary = salary * 1.2 � WHERE title= 'Database Designer' AND salary <100000;
20
A Tricky Issue of Semantics
Solution: Do UPDATE (and DELETE) in two phases.
Suppose we have a relation: Employee (name, salary, address, title)
UPDATE Employee� SET salary = salary * 1.2 � WHERE title= 'Database Designer' AND salary <100000;
Convince yourselves that this always returns the same answer independent of the order in which we process the tuples!
21
Announcements
Extenuating Circumstances Form
You now know everything for Project 1
Reminder: due date next Weds. 5pm
22
Relational Design and Constraints
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
23
Lecture 06, Data 101/Info 258 Spring 2024
SQL command categories, broadly
So far:
24
Image: source
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
in more detail
✅
✅
Introducing: Constraints
From before:
CREATE TABLE <relation name> AS (� <subquery>�);
You can also create a new table with an entirely new schema (and no initial tuples):
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER
);
25
How can we set up the relations so that you can then query them effectively?
Documentation CREATE TABLE
Introducing: Constraints
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
26
Why constraints?
Constraints help encode domain knowledge that the system can enforce; examples:
Constraints act as guardrails and provide safety:
Downside: they are expensive to enforce: need to check after every update
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
27
PRIMARY KEY and UNIQUE
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
28
Lecture 06, Data 101/Info 258 Spring 2024
The Notion of Keys
It is very useful to be able to identify tuples in a relation (and in the real world).
A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be both unique and not null.
One way to declare a key for a relation is using PRIMARY KEY (but there can be only one!)
29
Documentation 5.4
PRIMARY KEY
It is very useful to be able to identify tuples in a relation (and in the real world).
A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be both unique and not null.
One way to declare a key for a relation is using PRIMARY KEY (but there can be only one!)
CREATE TABLE Actor (� id INTEGER,� name TEXT,� PRIMARY KEY (id)�);
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER,� area REAL,� PRIMARY KEY (location, zipcode)
);
30
combo of location and zipcode is unique across tuples in Zips.
UNIQUE vs. PRIMARY KEY
While there can only be one PRIMARY KEY (hence the name), there can be many UNIQUEs.
31
CREATE TABLE Stops(
stopID INTEGER,
personID INTEGER,
stopTime TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20),
age INTEGER,
PRIMARY KEY (stopID),
UNIQUE (personID, stopTime)�);
From the PostgreSQL documentation
These table definitions accept the same data:
CREATE TABLE products (� product_no integer UNIQUE NOT NULL,� name text,� price numeric�);
CREATE TABLE products (� product_no integer PRIMARY KEY,� name text,� price numeric�);
32
Documentation 5.4
more next lecture
next up!
Other declarations in CREATE TABLE
There are many other declarations we can make for attributes!
Lots in the documentation, but for now here’s two useful ones:
CREATE TABLE Stops(
stopID INTEGER,
personID INTEGER,
stopTime TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20) NOT NULL,
age INTEGER,
arrest BOOLEAN DEFAULT False,
PRIMARY KEY (stopID),
UNIQUE (personID, stopTime)
);
33
Documentation CREATE TABLE
NOT NULL: The value for the attribute can never be NULL. Any changes that violate this declaration will be rejected.
DEFAULT <value>: If there is no specific value known for this attribute for a given tuple, use the stated <value> (as opposed to NULL, which is the default DEFAULT value).
Based on the below table definition, which attributes will never be NULL? Select all that apply.
CREATE TABLE Stops(
stopID INTEGER,
personID INTEGER,
stopTime TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20) NOT NULL,
age INTEGER,
arrest BOOLEAN DEFAULT False,
PRIMARY KEY (stopID),
UNIQUE (personID, stopTime)
);
34
🤔
Exercise
FOREIGN KEY Constraints
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
35
Got until here during Lecture 6
Lecture 06, Data 101/Info 258 Spring 2024
Foreign Keys
Referential integrity is ensuring that records do not violate the constraints defined between tables.
36
Foreign Keys
Referential integrity is ensuring that records do not violate the constraints defined between tables.
37
Example:
Cast_info (person_id, movie_id),
Actor (id, name),
Movie (id, title)
A foreign key constraint is like a pointer from a tuple in one table to a tuple in another table.
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
id | title |
23 | Forrest Gump |
45 | Tomorrow Never Dies |
Cast_info
Actor
Movie
Declaring a FK constraint
Use REFERENCES (and optionally, FOREIGN KEY).
38
CREATE TABLE Cast_info (
person_id INTEGER,
movie_id INTEGER,
FOREIGN KEY (person_id)� REFERENCES Actor (id),
FOREIGN KEY (movie_id)� REFERENCES Movie (id)
);
CREATE TABLE Actor (
id INTEGER,
name TEXT,
PRIMARY KEY id
);
CREATE TABLE Movie (
id INTEGER,
title TEXT,
PRIMARY KEY id
);
CREATE TABLE Cast_info (
person_id INTEGER,
movie_id INTEGER,
person_id INTEGER REFERENCES Actor (id),
movie_id INTEGER REFERENCES Movie (id)
);
Mostly equivalent. Above, only FOREIGN KEY supports a multi-attribute FK constraint.
Below declaration is less verbose.
REFERENCEd attributes must be declared UNIQUE or PRIMARY KEY! Why might this be?
Enforcing Foreign Key Constraints
Suppose we have the following relations. Which of the following insertions, updates, deletions of R.person_id (Cast_info.person_id) and S.id (Actor.id) could potentially violate the foreign key constraints shown?
A. An insertion/update to R that adds � new person_ids not yet found in S.id
B. An insertion/update to S that adds � new ids not yet found in S.id
C. An insertion/update to R that � duplicates person_ids that exist in S.id
D. A deletion from/update to R that � removes person_ids that exist in S.id
E. A deletion from/update to S that � removes ids that exist in R.id
39
🤔
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (Cast_info)
S�(Actor)
Enforcing Foreign Key Constraints
Suppose we have the following relations. Which of the following insertions, updates, deletions of R.person_id (Cast_info.person_id) and S.id (Actor.id) could potentially violate the foreign key constraints shown?
A. An insertion/update to R that adds � new person_ids not yet found in S.id
B. An insertion/update to S that adds � new ids not yet found in S.id
C. An insertion/update to R that � duplicates person_ids that exist in S.id
D. A deletion from/update to R that � removes person_ids that exist in S.id
E. A deletion from/update to S that � removes ids that exist in R.id
40
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (Cast_info)
S�(Actor)
⚠️ violation! FK must always reference an existing primary key.
OK for FK to not have certain primary keys
OK (unless FK has UNIQUE constraint)
OK for FK to not have certain primary keys
⚠️ violation!�dangling tuples!
Enforcing Foreign Key Constraints
⚠️ An insertion/update to R that adds � new person_ids not yet found in S.id
(OK) An insertion/update to S that adds � new ids not yet found in S.id
(OK) An insertion/update to R that � duplicates person_ids that exist in S.id
(OK) A deletion from/update to R that � removes person_ids that exist in S.id
⚠️ A deletion from/update to S that � removes ids that exist in R.id
41
person_id | movie_id |
1 | 23 |
3 | 45 |
3 | 71 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (Cast_info)
S�(Actor)
An insert or update to Cast_info that introduces a non-existent actor in Actor will be rejected (throw error).
A deletion or update to Actor that causes some tuples in Cast_info to “dangle” can be handled in three ways:
Which exercise do you want to try?
CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);
42
Assume we try the following (we always reset the relation to the diagram). What if we…
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
id | title |
23 | Forrest Gump |
45 | Tomorrow Never Dies |
Cast_info
Actor
Movie
🤔
Exercise
Attribute-based Constraints
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
Demo
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
43
Lecture 06, Data 101/Info 258 Spring 2024
Next: Attribute-based constraints
Constraints help encode domain knowledge that the system can enforce.
Constraints act as guardrails and provide safety.
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
44
✅
✅
Next: Attribute-based constraints
Constraints help encode domain knowledge that the system can enforce.
Constraints act as guardrails and provide safety.
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
45
✅
✅
Attribute-based constraints
An attribute-based constraint is where every value in a column satisfies a Boolean expression.
46
CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),
name TEXT
);
Ensures that actor_ids are four-digit integers.
Attribute-based constraints
An attribute-based constraint is where every value in a column satisfies a Boolean expression.
Attribute-based constraints are checked on changes to the relation itself.
For example, for Actor above:
47
CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),
name TEXT
);
Ensures that actor_ids are four-digit integers.
Why can’t attribute-based constraints enforce referential integrity?
Suppose we did the following:
CREATE TABLE Cast_info (
…,� person_id INTEGER CHECK
(EXISTS (SELECT *
FROM Actor A� WHERE person_id = A.id)� ),� …�);
Does this enforce the potential referential integrity violations?
(as opposed to FK constraint)
CREATE TABLE Cast_info (
...,� FOREIGN KEY (person_id)
REFERENCES Actor (id),� ...
);
48
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (Cast_info)
S�(Actor)
✅
❌
CHECKs (i.e., attribute-based constraints) can only enforce constraints on a single table.
Summary of Constraints
Constraints are relationships across data elements that the data system is required to preserve or enforce.
49
✅
✅
✅
Just for your info:
Just for your info:
[Extra] Triggers
Constraints introduce overhead to a data system
An alternative is triggers:
We’ll hopefully learn about triggers later!
50