Modifying Data:
DDL, DML, Constraints
1
Data 101, Fall 2025 @ UC Berkeley
Aditya Parameswaran
Announcements
2
Onto bigger and better data engineering!
3
SQL queries
(Select from where, joins,�subqueries, CTEs)
Relational Algebra Introduction
Performance tuning queries
So far
Next time
This time:
SQL command categories, broadly
4
Image: source
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
SQL command categories, broadly
SQL command categories, broadly
5
Image: source
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
so far
today
6
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
DML: Data Manipulation Language
We also need to be able to modify the data in relations!
Three kinds of modifications:
7
INSERT
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
8
titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
[Docs INSERT]
INSERT specifying attributes
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');
9
titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
INSERT a subquery’s result
INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');
INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');
INSERT INTO Titles (title_id) (� SELECT DISTINCT C.title_id� FROM Crew C� WHERE NOT EXISTS (SELECT * FROM Titles T WHERE C.title_id = T.title_id));
10
titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation INSERT
DELETE
DELETE FROM <table> WHERE <condition>;
DELETE FROM titles WHERE premiered = 2023;
DELETE FROM titles;
11
titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation DELETE
UPDATE
UPDATE titles� SET original_title = primary_title� WHERE original_title IS NULL;
UPDATE <table>� SET <list of attribute assignments>� WHERE <condition>;
12
titles
Column | Type -----------------+--------
title_id | text
type | text
primary_title | text
original_title | text
is_adult | bigint
premiered | bigint
ended | bigint
runtime_minutes | bigint
genres | text
Documentation UPDATE
13
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
SQL command categories, broadly
14
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
in more detail
✅
✅
DDL: Data Definition Language
How do we modify the schema of a table?
Three kinds of definitions:
15
Before You CREATE - Designing A Schema
Various decisions:
16
Before You CREATE - Designing A Schema
Various decisions:
Common types: [Docs §8]
INTEGER / BIGINT�REAL / DOUBLE
SERIAL An auto-incrementing integer (also BIG SERIAL)
BOOLEAN
CHAR(n) fixed length string of n characters�VARCHAR(n) variable length string of up to n characters�TEXT variable length string
Fun stuff: Arrays, JSON, XML, “hashes”... Lots more later on!
17
⚠️ Dates, Times & Timezones
DATE 2024-07-01
TIME 13:00:50 13:00:50.555
TIMESTAMP 2024-07-01 13:00:50.000
TIMESTAMPTZ 2024-07-01 13:00:50.000 PDT
18
CREATE
From before:
CREATE TABLE <relation name> AS (� <subquery>�);
Most often: create a new table with a new schema (and no initial tuples):
CREATE TABLE zips (� location VARCHAR(20),� zipcode INTEGER
);
19
syntax: attributeName ATTRIBUTETYPE
[Docs CREATE TABLE]
DROP
DROP TABLE <table name>;
DROP TABLE IF EXISTS <table name>;
20
[Docs DROP TABLE]
ALTER TABLE
Modify the schema.
ALTER TABLE Zips� ADD area REAL,� ADD timezone VARCHAR(5);
Add new columns to the relation
ALTER TABLE Zips � DROP area, � DROP timezone;
Delete old columns to the relation
21
Documentation ALTER TABLE
22
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
A Tricky Issue of Semantics
Suppose we have a table: employees (name, salary, address, title)
UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;
23
A Tricky Issue of Semantics
Issue with UPDATE (and DELETE):
Suppose we have a table: employees (name, salary, address, title)
UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;
24
A Tricky Issue of Semantics
Solution: Do UPDATE (and DELETE) in two phases.
Suppose we have a table: employees (name, salary, address, title)
UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;
25
26
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
SQL command categories, broadly
So far:
27
Data Definition Language
Transaction Control Language
Data Manipulation Language
Data�Query Language
Data�Control�Language
in more detail
✅
✅
Revisiting CREATE
From before:
CREATE TABLE <relation name> AS (� <subquery>�);
Can create a new table with a new schema (and no initial tuples):
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER
);
28
How can we set up the relations so that you can then use them effectively?
[Docs CREATE TABLE]
Things we might care about:
Introducing: Constraints
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
29
Why constraints?
Constraints help encode domain knowledge that the system can enforce:
Constraints act as guardrails and provide safety:
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
30
31
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
The Notion of Keys
Useful to be able to identify tuples in a relation (and in the real world).
A set of attributes is a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be unique (and also not null).
Declare a key for a relation using PRIMARY KEY. Only one primary key!
Be careful to choose (or create) keys that identify types in the future.
32
[Docs §5.4]
Question: Good Primary Keys?
33
PRIMARY KEY
Useful to be able to identify tuples in a relation (and in the real world).
A set of attributes is a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be unique (and also not null).
Declare a key for a relation using PRIMARY KEY. Only one primary key!
CREATE TABLE actors (� id INTEGER,� name TEXT,� PRIMARY KEY (id)�);
CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER,� area REAL,� PRIMARY KEY (location, zipcode)
);
34
combo of location and zipcode is unique
UNIQUE vs. PRIMARY KEY
Only one PRIMARY KEY, but there can be many UNIQUEs.
35
CREATE TABLE Stops (
stop_id INTEGER,
person_id INTEGER,
stop_time TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20),
age INTEGER,
PRIMARY KEY (stop_id),
UNIQUE (person_id, stop_time)�);
From the PostgreSQL documentation
These table definitions accept the same data:
CREATE TABLE products (� product_no integer UNIQUE NOT NULL,� name text,� price numeric�);
CREATE TABLE products (� product_no integer PRIMARY KEY,� name text,� price numeric�);
36
more next lecture
next up!
However, primary keys are special!
Other declarations in CREATE TABLE
There are many other declarations we can make for attributes!
Lots in the documentation, but for now here’s two useful ones:
CREATE TABLE Stops (
stop_id INTEGER,
person_id INTEGER,
stop_time TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20) NOT NULL,
age INTEGER,
arrest BOOLEAN DEFAULT False,
PRIMARY KEY (stopID),
UNIQUE (personID, stopTime)
);
37
[Docs CREATE TABLE]
NOT NULL: Value can never be NULL. Any changes that violate this will be rejected.
DEFAULT <value>: If there is no specific value known, use <value> (as opposed to NULL, which is the default DEFAULT).
(Try at home)
Based on the below table definition, which attributes will be NOT NULL? Select all that apply.
CREATE TABLE Stops (
stop_id INTEGER,
person_id INTEGER,
stop_time TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20) NOT NULL,
age INTEGER,
arrest BOOLEAN DEFAULT False,
PRIMARY KEY (stop_id),
UNIQUE (person_id, stop_time)
);
38
🤔
Demo
Answer – 3 attributes will never be NULL
(Try at home)
Answer
stop_id can’t be NULL as a Primary Key, and arrest will always have a default value, location is clearly never null.
CREATE TABLE Stops (
stop_id INTEGER,
person_id INTEGER,
stop_time TIMESTAMP,
race VARCHAR(10),
location VARCHAR(20) NOT NULL,
age INTEGER,
arrest BOOLEAN DEFAULT False,
PRIMARY KEY (stop_id),
UNIQUE (person_id, stop_time)
);
39
🤔
Demo
40
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
Foreign Keys
Referential integrity ensures that records do not violate constraints between tables.
Also, when we JOIN tables, we most often use foreign key.
41
Foreign Keys
Back to lecture 2, IMDB data. (With slight revised names)
42
Simplified Example:
cast_info (person_id, title_id),
actors (id, name),
movies (id, title)
A foreign key is like a pointer from a tuple in one table to a tuple in another table.
person_id | title_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
id | title |
23 | Forrest Gump |
45 | Tomorrow Never Dies |
We expect that:
cast_info
actors
movies
Declaring a FK constraint
Use REFERENCES (and optionally, FOREIGN KEY).
43
CREATE TABLE cast_info (
person_id INTEGER,
movie_id INTEGER,
FOREIGN KEY (person_id)� REFERENCES actors (id),
FOREIGN KEY (movie_id)� REFERENCES movies (id)
);
CREATE TABLE actors (
id INTEGER,
name TEXT,
PRIMARY KEY id
);
CREATE TABLE movies (
id INTEGER,
title TEXT,
PRIMARY KEY id
);
CREATE TABLE cast_info (
person_id INTEGER,
movie_id INTEGER,
person_id INTEGER REFERENCES actors (id),
movie_id INTEGER REFERENCES movies (id)
);
Above, only FOREIGN KEY supports a multi-attribute FK constraint. Below declaration is less verbose.
REFERENCEd attributes must be declared UNIQUE or PRIMARY KEY! Why might this be?
Enforcing Foreign Key Constraints
Which of the following changes to R.person_id (cast_info.person_id) and S.id (actors.id) could potentially violate the foreign key constraints?
A. An insertion/update to R that adds � new person_ids not yet found in S.id
B. An insertion/update to S that adds � new ids not yet found in S.id
C. An insertion/update to R that � duplicates person_ids that exist in S.id
D. A deletion from/update to R that � removes person_ids that exist in S.id
E. A deletion from/update to S that � removes ids that exist in R.id
44
🤔
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (cast_info)
S�(actors)
Enforcing Foreign Key Constraints
Which of the following changes to R.person_id (cast_info.person_id) and S.id (actors.id) could potentially violate the foreign key constraints?
A. An insertion/update to R that adds � new person_ids not yet found in S.id | ⚠️ violation! FK must always reference an existing primary key. |
B. An insertion/update to S that adds � new ids not yet found in S.id | OK for FK to not have certain primary keys |
C. An insertion/update to R that � duplicates person_ids that exist in S.id | OK (unless FK has UNIQUE constraint) |
D. A deletion from/update to R that � removes person_ids that exist in S.id | OK for FK to not have certain primary keys |
E. A deletion from/update to S that � removes ids that exist in R.id | ⚠️ violation! dangling tuples! |
Enforcing Foreign Key Constraints
⚠️ An insertion/update to R that adds � new person_ids not yet found in S.id
(OK) An insertion/update to S that adds � new ids not yet found in S.id
(OK) An insertion/update to R that � duplicates person_ids that exist in S.id
(OK) A deletion from/update to R that � removes person_ids that exist in S.id
⚠️ A deletion from/update to S that � removes ids that exist in R.id
46
person_id | movie_id |
1 | 23 |
3 | 45 |
3 | 71 |
10 | 23 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (cast_info)
S�(actor)
An insert or update to cast_info that introduces a non-existent actor in Actor will be rejected (throw error).
Enforcing Foreign Key Constraints
⚠️ An insertion/update to R that adds � new person_ids not yet found in S.id
…
⚠️ A deletion from/update to S that � removes ids that exist in R.id
A deletion or update to Actor that causes some tuples in cast_info to “dangle” can be handled in three ways:
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (cast_info)
S�(actor)
Try at home!
(Try at home)
CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);
48
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
id | title |
23 | Forrest Gump |
45 | Tomorrow Never Dies |
cast_info
actors
movies
🤔
Demo
Try at home!
(Try at home)
CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);
49
Assume we try the following (independently). �What if we…
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
id | title |
23 | Forrest Gump |
45 | Tomorrow Never Dies |
cast_info
actors
movies
🤔
Demo
50
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline
Next: Attribute-based constraints
Constraints help encode domain knowledge that the system can enforce:
Constraints act as guardrails and provide safety:
Constraints are relationships across data elements that�the data system is required to preserve or enforce.
51
Attribute-based constraints
An attribute-based constraint is where every value in a column satisfies a Boolean expression.
52
CREATE TABLE actors (� id INTEGER CHECK (id >= 1000 AND id <= 9999),
name TEXT
);
Ensures that ids are four-digit integers.
Attribute-based constraints
An attribute-based constraint is where every value in a column satisfies a Boolean expression.
Attribute-based constraints are checked on changes to the relation itself.
For example, for actors above:
53
CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),
name TEXT
);
Ensures that ids are four-digit integers.
Why can’t attribute-based constraints enforce referential integrity?
Suppose we did the following:
CREATE TABLE cast_info (
…,� person_id INTEGER CHECK
(EXISTS (SELECT *
FROM actors S� WHERE person_id = S.id)� ),� …�);
Does this enforce potential ref. integrity violations?
(as opposed to FK constraint)
CREATE TABLE Cast_info (
...,� FOREIGN KEY (person_id)
REFERENCES Actor (id),� ...
);
Attribute-based constraints are checked on changes to the relation itself.
person_id | movie_id |
1 | 23 |
3 | 45 |
id | name |
1 | Tom Hanks |
3 | Michelle Yeoh |
R (cast_info)
S (actors)
Summary of Constraints
Constraints are relationships across data elements that the data system is required to preserve or enforce.
55
✅
✅
✅
For your info:
For your info:
[Extra] Data Integrity Never Ends
56
[Extra] Triggers
Constraints introduce overhead to a data system
An alternative is triggers:
57
58
DML: INSERT, DELETE, UPDATE
DDL: CREATE, DROP, ALTER
UPDATE/DELETE: Under the hood
Relational Design and Constraints
PRIMARY KEY and UNIQUE
FOREIGN KEY Constraints
Attribute-based Constraints
Demo (if time, else at home)
Outline