1 of 58

Modifying Data:

DDL, DML, Constraints

1

Data 101, Fall 2025 @ UC Berkeley

Aditya Parameswaran

2 of 58

Announcements

  • Upcoming assignments (all due at 5pm!)
    • Project 0 is due 9/10
    • Homework 1 is due 9/17
    • Project 1 will be released this week
  • Tutoring sections are open! See Ed for details + signup info
  • Discussion sections and OH are now running!

2

3 of 58

Onto bigger and better data engineering!

3

SQL queries

(Select from where, joins,�subqueries, CTEs)

Relational Algebra Introduction

Performance tuning queries

So far

Next time

This time:

  • Data modification
  • Relation definition
  • Primary and foreign keys
  • Constraints

4 of 58

SQL command categories, broadly

4

Image: source

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

5 of 58

SQL command categories, broadly

SQL command categories, broadly

5

Image: source

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

so far

today

6 of 58

6

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

7 of 58

DML: Data Manipulation Language

We also need to be able to modify the data in relations!

  • A modification command doesn’t return the result in the same way a query does, but it changes the relational instance (i.e., the stored tuples)

Three kinds of modifications:

  • INSERT a tuple (or tuples);
  • DELETE an existing tuple (or tuples); and
  • UPDATE the value(s) of an existing tuple (or tuples).

7

8 of 58

INSERT

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

8

titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

[Docs INSERT]

9 of 58

INSERT specifying attributes

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');

  • Explicitly specify attributes to insert. Can be out of order!
  • Also allows system to fill in NULL values for values of attributes we don’t have.

9

titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

10 of 58

INSERT a subquery’s result

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');

  • Explicitly specify attributes to insert. Can be out of order!
  • Also allows system to fill in NULL values for values of attributes we don’t have.

INSERT INTO Titles (title_id) (SELECT DISTINCT C.title_id� FROM Crew C� WHERE NOT EXISTS (SELECT * FROM Titles T WHERE C.title_id = T.title_id));

  • Insert the entire result of a (sub)query into a relation. Don’t use the VALUES keyword.
  • What do you think the above insertion does?

10

titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation INSERT

11 of 58

DELETE

DELETE FROM <table> WHERE <condition>;

  • Deletes all tuples from table that satisfy a condition.

DELETE FROM titles WHERE premiered = 2023;

  • Remove all tuples from this year

DELETE FROM titles;

  • Removes all tuples!
  • The table still exists, but is empty (tuples can be added back later)

11

titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation DELETE

12 of 58

UPDATE

UPDATE titles� SET original_title = primary_title� WHERE original_title IS NULL;

  • Set all NULL original titles to be the primary title.

UPDATE <table>� SET <list of attribute assignments>� WHERE <condition>;

  • Changes certain attributes of tuples in Relation�that satisfy a condition.

12

titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation UPDATE

13 of 58

13

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

14 of 58

SQL command categories, broadly

14

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

in more detail

15 of 58

DDL: Data Definition Language

How do we modify the schema of a table?

  • A definition command also does not return the result in the same way a query does, but ..
  • it sets the relational schema (i.e., the attributes per tuple) and potentially the relational instance.

Three kinds of definitions:

  • CREATE a table (which also defines the schema);
  • DROP a table (which deletes both the schema and instance); and
  • ALTER a table’s attributes (and therefore redefines the schema).

15

16 of 58

Before You CREATE - Designing A Schema

Various decisions:

  • What columns should there be? What types?
  • Generally all lowercase_snake_case identifier names
  • Tables are plural, columns are based on the type.
  • Above all be consistent!

16

17 of 58

Before You CREATE - Designing A Schema

Various decisions:

  • What columns should there be? What types?
  • Generally all lowercase_snake_case identifier names
  • Tables are plural, columns are based on the type.
  • Above all be consistent!

Common types: [Docs §8]

INTEGER / BIGINT�REAL / DOUBLE

SERIAL An auto-incrementing integer (also BIG SERIAL)

BOOLEAN

CHAR(n) fixed length string of n characters�VARCHAR(n) variable length string of up to n characters�TEXT variable length string

Fun stuff: Arrays, JSON, XML, “hashes”... Lots more later on!

17

18 of 58

⚠️ Dates, Times & Timezones

  • This gets tricky in practice. Consistency matters most!
  • DATE: A day without a defined time
  • TIME: A time of a day, but no date
  • Either is not enough to know when an event happened.
  • TIMESTAMP: A date + time combo..
  • But… a timestamp without a timezone may be ambiguous!

DATE 2024-07-01

TIME 13:00:50 13:00:50.555

TIMESTAMP 2024-07-01 13:00:50.000

TIMESTAMPTZ 2024-07-01 13:00:50.000 PDT

18

19 of 58

CREATE

From before:

CREATE TABLE <relation name> AS (� <subquery>�);

Most often: create a new table with a new schema (and no initial tuples):

CREATE TABLE zips (� location VARCHAR(20),� zipcode INTEGER

);

19

syntax: attributeName ATTRIBUTETYPE

20 of 58

DROP

DROP TABLE <table name>;

  • Drop the entire relation from the database.
  • (similarly: DROP VIEW <view name>)

DROP TABLE IF EXISTS <table name>;

  • Avoid error if relation does not exist in the first place
  • Very common in our notebooks: drop table if it exists, then create again.

20

[Docs DROP TABLE]

21 of 58

ALTER TABLE

Modify the schema.

ALTER TABLE Zips� ADD area REAL,� ADD timezone VARCHAR(5);

Add new columns to the relation

ALTER TABLE Zips � DROP area, � DROP timezone;

Delete old columns to the relation

21

Documentation ALTER TABLE

22 of 58

22

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

23 of 58

A Tricky Issue of Semantics

Suppose we have a table: employees (name, salary, address, title)

UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;

  • Increase salaries of all database designers (w/ salary < $100k) by 20%.
  • Employee has a tuple: ('Bob', 50000, …, 'Database Designer')

23

24 of 58

A Tricky Issue of Semantics

Issue with UPDATE (and DELETE):

  • When the UPDATE hits Bob’s tuple, salary becomes 120% of $60,000.
  • Bob’s new salary is still below $100,000; still satisfies WHERE condition!
    • If we’re not careful, we could keep increasing Bob’s salary…
    • …until Bob’s salary hits >=100000!! (good for Bob…??)

Suppose we have a table: employees (name, salary, address, title)

UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;

  • Increase salaries of all database designers (w/ salary < $100k) by 20%.
  • Employee has a tuple: ('Bob', 50000, …, 'Database Designer')

24

25 of 58

A Tricky Issue of Semantics

Solution: Do UPDATE (and DELETE) in two phases.

  1. Phase 1
    • Look up each tuple; evaluate the UPDATE (or DELETE)’s WHERE condition
    • If the tuple matches WHERE condition, mark it for an update (or delete).
  2. Phase 2
    • Now actually update/delete all the marked tuples.

Suppose we have a table: employees (name, salary, address, title)

UPDATE employees� SET salary = salary * 1.2 � WHERE title = 'Database Designer' AND salary < 100000;

  • Increase salaries of all database designers (w/ salary < $100k) by 20%.
  • Employee has a tuple: ('Bob', 50000, …, 'Database Designer')

25

26 of 58

26

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

27 of 58

SQL command categories, broadly

So far:

  • Relational Algebra (RA): core primitives, supported across multiple data systems (databases, Spark/DF, dataframes).
  • SQL queries on one more more relations that builds on/extends RA.
  • SQL updates to populate/manipulate the data in relations.

27

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

in more detail

28 of 58

Revisiting CREATE

From before:

CREATE TABLE <relation name> AS (� <subquery>�);

Can create a new table with a new schema (and no initial tuples):

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER

);

28

How can we set up the relations so that you can then use them effectively?

Things we might care about:

  • Cleanliness and validity of data (eg Zip Codes) ← our focus today
  • Are we putting the right data together? Splitting them apart?

29 of 58

Introducing: Constraints

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

29

30 of 58

Why constraints?

Constraints help encode domain knowledge that the system can enforce:

  • eg: age is between 0->120; zipcode is five digits; etc.
  • eg: movie_id in cast_info should be the same ids as in movies

Constraints act as guardrails and provide safety:

  • Prevents unintended errors, e.g., when new inserts violate constraints
  • But… maybe at the cost of performance?

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

30

31 of 58

31

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

32 of 58

The Notion of Keys

Useful to be able to identify tuples in a relation (and in the real world).

  • CalnetID or student ID
  • SSN or passport number to identify people

A set of attributes is a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be unique (and also not null).

Declare a key for a relation using PRIMARY KEY. Only one primary key!

Be careful to choose (or create) keys that identify types in the future.

32

[Docs §5.4]

33 of 58

Question: Good Primary Keys?

  • Assume you're building a tool that needs to identify individual students, what options could make good primary keys?

  • email: A likely good primary key, but technically someone could change their email address
  • student id: a great candidate for a PK
  • phone number: not a great PK; phone numbers aren’t directly tied to people and can change or be shared.
  • student name: not a good PK because names are not unique and can change
  • a new number based on when you joined UCB: a great PK because it can be unique, and does not have to change. In fact, this is what Berkeley uses.

33

34 of 58

PRIMARY KEY

Useful to be able to identify tuples in a relation (and in the real world).

A set of attributes is a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be unique (and also not null).

Declare a key for a relation using PRIMARY KEY. Only one primary key!

CREATE TABLE actors (� id INTEGER,� name TEXT,� PRIMARY KEY (id)�);

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER,� area REAL,� PRIMARY KEY (location, zipcode)

);

34

combo of location and zipcode is unique

35 of 58

UNIQUE vs. PRIMARY KEY

Only one PRIMARY KEY, but there can be many UNIQUEs.

  • UNIQUE also enforces attrib. values to be unique throughout the relation.
  • UNIQUE does not enforce non-null like PRIMARY KEY.

35

CREATE TABLE Stops (

stop_id INTEGER,

person_id INTEGER,

stop_time TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20),

age INTEGER,

PRIMARY KEY (stop_id),

UNIQUE (person_id, stop_time)�);

  • Attributes within UNIQUEs can be NULL.
  • Can be several tuples with NULL values for UNIQUE attributes.

36 of 58

From the PostgreSQL documentation

These table definitions accept the same data:

CREATE TABLE products (� product_no integer UNIQUE NOT NULL,� name text,� price numeric�);

CREATE TABLE products (� product_no integer PRIMARY KEY,� name text,� price numeric�);

36

more next lecture

next up!

However, primary keys are special!

37 of 58

Other declarations in CREATE TABLE

There are many other declarations we can make for attributes!

Lots in the documentation, but for now here’s two useful ones:

CREATE TABLE Stops (

stop_id INTEGER,

person_id INTEGER,

stop_time TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20) NOT NULL,

age INTEGER,

arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)

);

37

NOT NULL: Value can never be NULL. Any changes that violate this will be rejected.

DEFAULT <value>: If there is no specific value known, use <value> (as opposed to NULL, which is the default DEFAULT).

38 of 58

(Try at home)

Based on the below table definition, which attributes will be NOT NULL? Select all that apply.

CREATE TABLE Stops (

stop_id INTEGER,

person_id INTEGER,

stop_time TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20) NOT NULL,

age INTEGER,

arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stop_id),

UNIQUE (person_id, stop_time)

);

38

🤔

Demo

39 of 58

Answer – 3 attributes will never be NULL

(Try at home)

Answer

stop_id can’t be NULL as a Primary Key, and arrest will always have a default value, location is clearly never null.

CREATE TABLE Stops (

stop_id INTEGER,

person_id INTEGER,

stop_time TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20) NOT NULL,

age INTEGER,

arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stop_id),

UNIQUE (person_id, stop_time)

);

39

🤔

Demo

40 of 58

40

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

41 of 58

Foreign Keys

Referential integrity ensures that records do not violate constraints between tables.

  • A foreign key (FK) maintains referential integrity by restricting updates based on whether an attribute value in one table exists in another table.
  • Databases help manage this constraint for you!

Also, when we JOIN tables, we most often use foreign key.

41

42 of 58

Foreign Keys

Back to lecture 2, IMDB data. (With slight revised names)

42

Simplified Example:

cast_info (person_id, title_id),

actors (id, name),

movies (id, title)

A foreign key is like a pointer from a tuple in one table to a tuple in another table.

person_id

title_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

id

title

23

Forrest Gump

45

Tomorrow Never Dies

We expect that:

  • person_id in crew must be in the ids in actors.
  • title_id in crew must be in the ids in movies.

cast_info

actors

movies

43 of 58

Declaring a FK constraint

Use REFERENCES (and optionally, FOREIGN KEY).

43

CREATE TABLE cast_info (

person_id INTEGER,

movie_id INTEGER,

FOREIGN KEY (person_id)REFERENCES actors (id),

FOREIGN KEY (movie_id)� REFERENCES movies (id)

);

CREATE TABLE actors (

id INTEGER,

name TEXT,

PRIMARY KEY id

);

CREATE TABLE movies (

id INTEGER,

title TEXT,

PRIMARY KEY id

);

CREATE TABLE cast_info (

person_id INTEGER,

movie_id INTEGER,

person_id INTEGER REFERENCES actors (id),

movie_id INTEGER REFERENCES movies (id)

);

Above, only FOREIGN KEY supports a multi-attribute FK constraint. Below declaration is less verbose.

REFERENCEd attributes must be declared UNIQUE or PRIMARY KEY! Why might this be?

44 of 58

Enforcing Foreign Key Constraints

Which of the following changes to R.person_id (cast_info.person_id) and S.id (actors.id) could potentially violate the foreign key constraints?

A. An insertion/update to R that addsnew person_ids not yet found in S.id

B. An insertion/update to S that addsnew ids not yet found in S.id

C. An insertion/update to R that � duplicates person_ids that exist in S.id

D. A deletion from/update to R that � removes person_ids that exist in S.id

E. A deletion from/update to S that � removes ids that exist in R.id

44

🤔

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (cast_info)

S�(actors)

45 of 58

Enforcing Foreign Key Constraints

Which of the following changes to R.person_id (cast_info.person_id) and S.id (actors.id) could potentially violate the foreign key constraints?

A. An insertion/update to R that addsnew person_ids not yet found in S.id

⚠️ violation! FK must always reference an existing primary key.

B. An insertion/update to S that addsnew ids not yet found in S.id

OK for FK to not have certain primary keys

C. An insertion/update to R that � duplicates person_ids that exist in S.id

OK (unless FK has UNIQUE constraint)

D. A deletion from/update to R that � removes person_ids that exist in S.id

OK for FK to not have certain primary keys

E. A deletion from/update to S that � removes ids that exist in R.id

⚠️ violation! dangling tuples!

46 of 58

Enforcing Foreign Key Constraints

⚠️ An insertion/update to R that addsnew person_ids not yet found in S.id

(OK) An insertion/update to S that addsnew ids not yet found in S.id

(OK) An insertion/update to R that � duplicates person_ids that exist in S.id

(OK) A deletion from/update to R that � removes person_ids that exist in S.id

⚠️ A deletion from/update to S that � removes ids that exist in R.id

46

person_id

movie_id

1

23

3

45

3

71

10

23

id

name

1

Tom Hanks

3

Michelle Yeoh

R (cast_info)

S�(actor)

An insert or update to cast_info that introduces a non-existent actor in Actor will be rejected (throw error).

47 of 58

Enforcing Foreign Key Constraints

⚠️ An insertion/update to R that addsnew person_ids not yet found in S.id

⚠️ A deletion from/update to S that � removes ids that exist in R.id

A deletion or update to Actor that causes some tuples in cast_info to “dangle” can be handled in three ways:

  • [default] Reject/throw error.
  • CASCADE: make same change to Cast_info
    • Deleted Actor: delete cast_info tuples that refer to it
    • Updated Actor: update cast_info tuples that refer to it
  • SET NULL: change person_id in Cast_info to NULL

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (cast_info)

S�(actor)

48 of 58

Try at home!

(Try at home)

CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);

48

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

id

title

23

Forrest Gump

45

Tomorrow Never Dies

cast_info

actors

movies

🤔

Demo

49 of 58

Try at home!

(Try at home)

CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);

49

Assume we try the following (independently). �What if we…

  1. Delete a tuple from actors�corresponding to id = 1?
  2. Change a tuple in actors from id = 1 to id = 2?
  3. Delete a tuple from movies�corresponding to id = 23?
  4. Change a tuple in movies from�id = 23 to id = 24?
  5. Insert a tuple into cast_info that adds a new person_id not found in actors?

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

id

title

23

Forrest Gump

45

Tomorrow Never Dies

cast_info

actors

movies

🤔

Demo

50 of 58

50

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline

51 of 58

Next: Attribute-based constraints

Constraints help encode domain knowledge that the system can enforce:

  • eg: age is between 0->120; zipcode is five digits; etc.
  • eg movie_id in cast_info should be the same ids as in movies

Constraints act as guardrails and provide safety:

  • Prevents unintended errors, e.g., when new inserts violate constraints
  • But… maybe at the cost of performance?

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

51

52 of 58

Attribute-based constraints

An attribute-based constraint is where every value in a column satisfies a Boolean expression.

  • NOT NULL is an attribute-based constraint!
  • Boolean expressions can also be expressed via CHECK.
    • AKA “Check constraint.” (Documentation 5.4.1).

52

CREATE TABLE actors (� id INTEGER CHECK (id >= 1000 AND id <= 9999),

name TEXT

);

Ensures that ids are four-digit integers.

53 of 58

Attribute-based constraints

An attribute-based constraint is where every value in a column satisfies a Boolean expression.

  • NOT NULL is an attribute-based constraint!
  • Boolean expressions can also be expressed via CHECK.
  • AKA “Check constraint.” (Documentation 5.4.1).

Attribute-based constraints are checked on changes to the relation itself.

For example, for actors above:

  • Checked when a new id is added or an existing one is updated
  • Not checked when an id is deleted (why?)

53

CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),

name TEXT

);

Ensures that ids are four-digit integers.

54 of 58

Why can’t attribute-based constraints enforce referential integrity?

Suppose we did the following:

CREATE TABLE cast_info (

…,� person_id INTEGER CHECK

(EXISTS (SELECT *

FROM actors S� WHERE person_id = S.id)� ),� …�);

Does this enforce potential ref. integrity violations?

  • An insertion/update to R (Cast_info) that addsnew person_ids not yet found in S.id
  • A deletion from/update to S (Actor) that �removes ids that exist in R.id

(as opposed to FK constraint)

CREATE TABLE Cast_info (

...,� FOREIGN KEY (person_id)

REFERENCES Actor (id),� ...

);

Attribute-based constraints are checked on changes to the relation itself.

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (cast_info)

S (actors)

55 of 58

Summary of Constraints

Constraints are relationships across data elements that the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

55

For your info:

  • Constraints across multiple attributes in a tuple
  • CONSTRAINT <constraintname> CHECK
  • Listed at the end of the list of attributes
  • PostgreSQL documentation

For your info:

  • CREATE ASSERTION constraints across multiple relations
  • Not implemented in PostgreSQL, but see SQL92 (Documentation)

56 of 58

[Extra] Data Integrity Never Ends

  • Constraints are only one set of tools.
  • CREATE TYPE (enums) data types to specify one of set of values [Docs §8.7]
  • CREATE DOMAIN lets you declare new types [Docs §8.17]
  • Triggers and Rules can act like “callback” functions
  • Can extend SQL with SQL, Python, etc. [Docs Chp 38]

56

57 of 58

[Extra] Triggers

Constraints introduce overhead to a data system

  • Specifically, it’s also not clear when to check constraints, especially those that are across tables (as CHECK ASSERTIONs)
  • Worst case behavior is to check all constraints after every edit
    • Systems are better than this, but not much better

An alternative is triggers:

  • Triggers are “set off” on specific events specified by the user
  • Often less overhead than constraints

57

58 of 58

58

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

Demo (if time, else at home)

Outline