1 of 50

Updates: DDL, DML

February 7, 2024

1

Data 101/Info 258, Spring 2024 @ UC Berkeley

Aditya Parameswaran https://data101.org/sp24

LECTURE 06

2 of 50

Onto bigger and better data engineering!

2

SQL queries

(Select from where, joins,�window functions)

Relational Algebra Introduction

Performance tuning queries

So far

Next time

This time:

  • Data modification
  • Relation definition
  • Primary and foreign keys
  • Constraints

3 of 50

SQL command categories, broadly

3

Image: source

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

4 of 50

DML: INSERT, DELETE, UPDATE

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

4

Lecture 06, Data 101/Info 258 Spring 2024

5 of 50

SQL command categories, broadly

5

Image: source

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

so far

today

6 of 50

DML: Data Modification Language

We also need to be able to modify the data in relations!

A modification command doesn’t return a result the same way queries we’ve seen so far do, but it changes the relational instance (i.e., the stored tuples).

Three kinds of modifications:

  • INSERT a tuple (or tuples);
  • DELETE an existing tuple (or tuples); and
  • UPDATE the value(s) of an existing tuple (or tuples).

6

7 of 50

INSERT

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

7

Titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation INSERT

8 of 50

INSERT specifying attributes

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');

  • Explicitly specify attributes to insert. Can be out of order!
  • Also allows system to fill in NULL values for values of attributes we don’t have.

8

Titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation INSERT

9 of 50

INSERT a subquery’s result

INSERT INTO Titles VALUES� ('tt11737520', 'tvSeries', 'One Piece', 'One Piece',� 0, 2023, NULL, 60, 'Action,Adventure,Comedy'),� ('tt1001526','movie', 'Megamind', 'Megamind',� 0, 2010, NULL, 95, 'Action,Adventure,Comedy');

  • Each tuple is specified by parentheses (), separated by commas.
  • Each tuple must have all attributes in schema, in order.

INSERT INTO Titles (title_id, type, premiered, primary_title) VALUES� ('tt11737520', 'tvSeries', 2023, 'One Piece');

  • Explicitly specify attributes to insert. Can be out of order!
  • Also allows system to fill in NULL values for values of attributes we don’t have.

INSERT INTO Titles (title_id) (SELECT DISTINCT C.title_id� FROM Crew C� WHERE NOT EXISTS (SELECT * FROM Titles T WHERE C.title_id = T.title_id));

  • Insert the entire result of a (sub)query into a relation. Don’t use the VALUES keyword.
  • What do you think the above insertion does?

9

Titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation INSERT

10 of 50

DELETE

DELETE FROM <Relation> WHERE <condition>;

  • Deletes all tuples from Relation that satisfy a condition.

DELETE FROM Titles WHERE premiered >= 2023;

  • Remove all tuples from this year

DELETE FROM Titles;

  • Removes all tuples!

10

Titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation DELETE

11 of 50

UPDATE

UPDATE Titles� SET original_title = primary_title� WHERE original_title IS NULL;

  • Set all NULL original titles to be the primary title.

UPDATE <Relation>� SET <list of attribute assignments>� WHERE <condition>;

  • Changes certain attributes of tuples in Relation�that satisfy a condition.

11

Titles

Column | Type -----------------+--------

title_id | text

type | text

primary_title | text

original_title | text

is_adult | bigint

premiered | bigint

ended | bigint

runtime_minutes | bigint

genres | text

Documentation INSERT

12 of 50

DDL: CREATE, DROP, ALTER

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

12

Lecture 06, Data 101/Info 258 Spring 2024

13 of 50

DDL: Data Definition Language

How do we modify the schema of a table?

A definition command also doesn’t return a result the same way queries we’ve seen so far do, but it sets/changes the relational schema (i.e., attributes) + potentially the relational instance.

Three kinds of definitions:

  • CREATE a table (which also defines the schema);
  • DROP a table (which deletes both the schema and instance); and
  • ALTER a table’s attributes (and therefore redefines the schema and changes instance).

13

14 of 50

CREATE

From before:

CREATE TABLE <relation name> AS (� <subquery>�);

You can also create a new table with an entirely new schema (and no initial tuples):

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER

);

14

syntax: attributeName ATTRIBUTETYPE

Documentation CREATE TABLE

15 of 50

CREATE

From before:

CREATE TABLE <relation name> AS (� <subquery>�);

You can also create a new table with an entirely new schema (and no initial tuples):

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER

);

Common types:

INTEGER �REAL

CHAR(n) fixed length string of n characters�VARCHAR(n) variable length string of up � to n characters�TEXT variable length string

DATE ‘yyyy-mm-dd’�TIME ‘hr:min:sec’�TIMESTAMP ‘yyyy-mm-dd hr:min:sec’

BOOLEAN

15

syntax: attributeName ATTRIBUTETYPE

Documentation CREATE TABLE

16 of 50

DROP

DROP TABLE <relation name>;

  • Drop the entire relation from the database.
  • (similarly: DROP VIEW <view name>)

DROP TABLE IF EXISTS <relation name>;

  • Avoid throwing an error if relation does not exist in the first place.
  • Very common in our notebook code: Drop table if it exists, then create table again.

16

Documentation DROP TABLE

17 of 50

ALTER TABLE

Redefine the schema.

ALTER TABLE Zips� ADD area REAL DEFAULT ‘1.0’,� ADD timezone VARCHAR(5);

  • Add new columns to the relation
  • Set defaults or default to NULL

ALTER TABLE Zips � DROP area, � DROP timezone;

  • Delete old columns to the relation

17

Documentation ALTER TABLE

18 of 50

Demo

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

18

Lecture 06, Data 101/Info 258 Spring 2024

19 of 50

UPDATE/DELETE: Under the hood

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

19

Lecture 06, Data 101/Info 258 Spring 2024

20 of 50

A Tricky Issue of Semantics

Issue with UPDATE (and DELETE):

  • When the UPDATE statement hits Bob’s tuple, it changes that salary to 120% of the original one, $60,000.
  • Bob’s new salary is still below $100,000. It still satisfies the WHERE condition!

Suppose we have a relation: Employee (name, salary, address, title)

UPDATE Employee� SET salary = salary * 1.2 � WHERE title= 'Database Designer' AND salary <100000;

  • This increases salaries by of all database designers (who have a salary < $100k) by 20%.
  • Employee has a tuple: ('Bob', 50000, <redacted>, 'Database Designer').

20

  • If we’re not careful with how we process rows to UPDATE, we could keep increasing Bob’s salary… until when?
  • …until Bob’s salary hits >=100000!! (Bob: “sounds great!”)

21 of 50

A Tricky Issue of Semantics

Solution: Do UPDATE (and DELETE) in two phases.

  • Phase 1
    • Look up each tuple and evaluate the UPDATE (or DELETE)’s WHERE condition
    • If the tuple matches the WHERE condition, we mark it for an update (or delete).

Suppose we have a relation: Employee (name, salary, address, title)

UPDATE Employee� SET salary = salary * 1.2 � WHERE title= 'Database Designer' AND salary <100000;

  • This increases salaries by of all database designers (who have a salary < $100k) by 20%.
  • Employee has a tuple: ('Bob', 50000, <redacted>, 'Database Designer').
  • Phase 2
    • Now actually update/delete all the marked tuples.

Convince yourselves that this always returns the same answer independent of the order in which we process the tuples!

21

22 of 50

Announcements

Extenuating Circumstances Form

You now know everything for Project 1

Reminder: due date next Weds. 5pm

22

23 of 50

Relational Design and Constraints

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

23

Lecture 06, Data 101/Info 258 Spring 2024

24 of 50

SQL command categories, broadly

So far:

  • Relational Algebra (RA): a core set of primitives, supported across multiple data systems (databases, Spark/DF API, dataframes).
  • SQL queries on one more more relations that builds on/extends RA.
  • SQL updates that allow you to populate and manipulate the data in your relations.

24

Image: source

Data Definition Language

Transaction Control Language

Data Manipulation Language

Data�Query Language

Data�Control�Language

in more detail

25 of 50

Introducing: Constraints

From before:

CREATE TABLE <relation name> AS (� <subquery>�);

You can also create a new table with an entirely new schema (and no initial tuples):

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER

);

25

How can we set up the relations so that you can then query them effectively?

Documentation CREATE TABLE

26 of 50

Introducing: Constraints

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

26

27 of 50

Why constraints?

Constraints help encode domain knowledge that the system can enforce; examples:

  • age is between 0->120; zipcode is five digits; etc.
  • movie_id in cast_info table should be the same ids as in the Movie table

Constraints act as guardrails and provide safety:

  • Prevents unintended downstream errors, e.g., when new inserts violate constraints

Downside: they are expensive to enforce: need to check after every update

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

27

28 of 50

PRIMARY KEY and UNIQUE

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

28

Lecture 06, Data 101/Info 258 Spring 2024

29 of 50

The Notion of Keys

It is very useful to be able to identify tuples in a relation (and in the real world).

  • CalnetID or email address to identify students
  • SSN or passport number to identify people

A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be both unique and not null.

One way to declare a key for a relation is using PRIMARY KEY (but there can be only one!)

29

Documentation 5.4

30 of 50

PRIMARY KEY

It is very useful to be able to identify tuples in a relation (and in the real world).

  • CalnetID or email address to identify students
  • SSN or passport number to identify people

A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes, i.e., the values must be both unique and not null.

One way to declare a key for a relation is using PRIMARY KEY (but there can be only one!)

CREATE TABLE Actor (� id INTEGER,� name TEXT,� PRIMARY KEY (id)�);

CREATE TABLE Zips (� location VARCHAR(20),� zipcode INTEGER,� area REAL,� PRIMARY KEY (location, zipcode)

);

30

combo of location and zipcode is unique across tuples in Zips.

31 of 50

UNIQUE vs. PRIMARY KEY

While there can only be one PRIMARY KEY (hence the name), there can be many UNIQUEs.

  • UNIQUE also enforces attribute values to be unique throughout the relation.
  • However, UNIQUE does not enforce non-null in the same way PRIMARY KEY does..

31

CREATE TABLE Stops(

stopID INTEGER,

personID INTEGER,

stopTime TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20),

age INTEGER,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)�);

  • Attributes within UNIQUEs can be NULL.
  • There can be several tuples with NULL values for UNIQUE attributes.

32 of 50

From the PostgreSQL documentation

These table definitions accept the same data:

CREATE TABLE products (� product_no integer UNIQUE NOT NULL,� name text,� price numeric�);

CREATE TABLE products (� product_no integer PRIMARY KEY,� name text,� price numeric�);

32

Documentation 5.4

more next lecture

next up!

33 of 50

Other declarations in CREATE TABLE

There are many other declarations we can make for attributes!

Lots in the documentation, but for now here’s two useful ones:

CREATE TABLE Stops(

stopID INTEGER,

personID INTEGER,

stopTime TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20) NOT NULL,

age INTEGER,

arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)

);

33

Documentation CREATE TABLE

NOT NULL: The value for the attribute can never be NULL. Any changes that violate this declaration will be rejected.

DEFAULT <value>: If there is no specific value known for this attribute for a given tuple, use the stated <value> (as opposed to NULL, which is the default DEFAULT value).

34 of 50

Based on the below table definition, which attributes will never be NULL? Select all that apply.

CREATE TABLE Stops(

stopID INTEGER,

personID INTEGER,

stopTime TIMESTAMP,

race VARCHAR(10),

location VARCHAR(20) NOT NULL,

age INTEGER,

arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)

);

34

🤔

Exercise

35 of 50

FOREIGN KEY Constraints

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

35

Got until here during Lecture 6

Lecture 06, Data 101/Info 258 Spring 2024

36 of 50

Foreign Keys

Referential integrity is ensuring that records do not violate the constraints defined between tables.

  • A foreign key (FK) maintains the referential integrity of your data by restricting record insertion based on whether an attribute value in one table exists in another table.
  • PostgreSQL helps manage this constraint for you!

36

37 of 50

Foreign Keys

Referential integrity is ensuring that records do not violate the constraints defined between tables.

  • A foreign key (FK) maintains the referential integrity of your data by restricting record insertion based on whether an attribute value in one table exists in another table.
  • PostgreSQL helps manage this constraint for you!

37

Example:

Cast_info (person_id, movie_id),

Actor (id, name),

Movie (id, title)

A foreign key constraint is like a pointer from a tuple in one table to a tuple in another table.

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

id

title

23

Forrest Gump

45

Tomorrow Never Dies

  • We expect that: person_id in Cast_info must be present in the ids in Actor.
  • Likewise, movie_id in Cast_info must be present in the ids in Movie.

Cast_info

Actor

Movie

38 of 50

Declaring a FK constraint

Use REFERENCES (and optionally, FOREIGN KEY).

38

CREATE TABLE Cast_info (

person_id INTEGER,

movie_id INTEGER,

FOREIGN KEY (person_id)REFERENCES Actor (id),

FOREIGN KEY (movie_id)� REFERENCES Movie (id)

);

CREATE TABLE Actor (

id INTEGER,

name TEXT,

PRIMARY KEY id

);

CREATE TABLE Movie (

id INTEGER,

title TEXT,

PRIMARY KEY id

);

CREATE TABLE Cast_info (

person_id INTEGER,

movie_id INTEGER,

person_id INTEGER REFERENCES Actor (id),

movie_id INTEGER REFERENCES Movie (id)

);

Mostly equivalent. Above, only FOREIGN KEY supports a multi-attribute FK constraint.

Below declaration is less verbose.

REFERENCEd attributes must be declared UNIQUE or PRIMARY KEY! Why might this be?

39 of 50

Enforcing Foreign Key Constraints

Suppose we have the following relations. Which of the following insertions, updates, deletions of R.person_id (Cast_info.person_id) and S.id (Actor.id) could potentially violate the foreign key constraints shown?

A. An insertion/update to R that addsnew person_ids not yet found in S.id

B. An insertion/update to S that addsnew ids not yet found in S.id

C. An insertion/update to R that � duplicates person_ids that exist in S.id

D. A deletion from/update to R that � removes person_ids that exist in S.id

E. A deletion from/update to S that � removes ids that exist in R.id

39

🤔

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (Cast_info)

S�(Actor)

40 of 50

Enforcing Foreign Key Constraints

Suppose we have the following relations. Which of the following insertions, updates, deletions of R.person_id (Cast_info.person_id) and S.id (Actor.id) could potentially violate the foreign key constraints shown?

A. An insertion/update to R that addsnew person_ids not yet found in S.id

B. An insertion/update to S that addsnew ids not yet found in S.id

C. An insertion/update to R that � duplicates person_ids that exist in S.id

D. A deletion from/update to R that � removes person_ids that exist in S.id

E. A deletion from/update to S that � removes ids that exist in R.id

40

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (Cast_info)

S�(Actor)

⚠️ violation! FK must always reference an existing primary key.

OK for FK to not have certain primary keys

OK (unless FK has UNIQUE constraint)

OK for FK to not have certain primary keys

⚠️ violation!�dangling tuples!

41 of 50

Enforcing Foreign Key Constraints

⚠️ An insertion/update to R that addsnew person_ids not yet found in S.id

(OK) An insertion/update to S that addsnew ids not yet found in S.id

(OK) An insertion/update to R that � duplicates person_ids that exist in S.id

(OK) A deletion from/update to R that � removes person_ids that exist in S.id

⚠️ A deletion from/update to S that � removes ids that exist in R.id

41

person_id

movie_id

1

23

3

45

3

71

id

name

1

Tom Hanks

3

Michelle Yeoh

R (Cast_info)

S�(Actor)

An insert or update to Cast_info that introduces a non-existent actor in Actor will be rejected (throw error).

A deletion or update to Actor that causes some tuples in Cast_info to “dangle” can be handled in three ways:

  • [default] Reject/throw error.
  • CASCADE: make the same change to Cast_info as in Actor
    • Deleted Actor: delete the Cast_info tuples that refer to it
    • Updated Actor: update the Cast_info tuples that refer to it
  • SET NULL: change the person_id in Cast_info be NULL

42 of 50

Which exercise do you want to try?

CREATE TABLE Cast_info (� person_id INTEGER,� movie_id INTEGER, � FOREIGN KEY (person_id)� REFERENCES Actor (id)� ON DELETE SET NULL� ON UPDATE CASCADE,� FOREIGN KEY (movie_id)� REFERENCES Movie (id)� ON DELETE SET NULL);

42

Assume we try the following (we always reset the relation to the diagram). What if we…

  1. Delete a tuple from Actor�corresponding to id = 1?
  2. Change a tuple in Actor from id = 1 to id = 2?
  3. Delete a tuple from Movie�corresponding to id = 23?
  4. Change a tuple in Movie from�id = 23 to id = 24?
  5. Insert a tuple into Cast_info that adds a new person_id not found in Actor?

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

id

title

23

Forrest Gump

45

Tomorrow Never Dies

Cast_info

Actor

Movie

🤔

Exercise

43 of 50

Attribute-based Constraints

DML: INSERT, DELETE, UPDATE

DDL: CREATE, DROP, ALTER

Demo

UPDATE/DELETE: Under the hood

Relational Design and Constraints

PRIMARY KEY and UNIQUE

FOREIGN KEY Constraints

Attribute-based Constraints

43

Lecture 06, Data 101/Info 258 Spring 2024

44 of 50

Next: Attribute-based constraints

Constraints help encode domain knowledge that the system can enforce.

  • For example: age is between 0->120; zipcode is five digits; etc.
  • For example: movie_id in the Cast info_table should be the same ids as in the Movie table.

Constraints act as guardrails and provide safety.

  • Prevents unintended downstream errors, e.g., when new inserts violate constraints

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

44

45 of 50

Next: Attribute-based constraints

Constraints help encode domain knowledge that the system can enforce.

  • For example: age is between 0->120; zipcode is five digits
  • For example: movie_id in the Cast info_table should be the same ids as in the Movie table.

Constraints act as guardrails and provide safety.

  • Prevents unintended downstream errors, e.g., when new inserts violate constraints

Constraints are relationships across data elements that�the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

45

46 of 50

Attribute-based constraints

An attribute-based constraint is where every value in a column satisfies a Boolean expression.

  • NOT NULL is an attribute-based constraint!
  • Boolean expressions can also be expressed via CHECK.
  • AKA “Check constraint.” (Documentation 5.4.1).

46

CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),

name TEXT

);

Ensures that actor_ids are four-digit integers.

47 of 50

Attribute-based constraints

An attribute-based constraint is where every value in a column satisfies a Boolean expression.

  • NOT NULL is an attribute-based constraint!
  • Boolean expressions can also be expressed via CHECK.
  • AKA “Check constraint.” (Documentation 5.4.1).

Attribute-based constraints are checked on changes to the relation itself.

For example, for Actor above:

  • Checked when a new id is added or an existing one is updated
  • Not checked when an id is deleted (why?)

47

CREATE TABLE Actor (� id INTEGER CHECK (id >= 1000 AND id <= 9999),

name TEXT

);

Ensures that actor_ids are four-digit integers.

48 of 50

Why can’t attribute-based constraints enforce referential integrity?

Suppose we did the following:

CREATE TABLE Cast_info (

…,� person_id INTEGER CHECK

(EXISTS (SELECT *

FROM Actor A� WHERE person_id = A.id)� ),� …�);

Does this enforce the potential referential integrity violations?

  • An insertion/update to R (Cast_info) that addsnew person_ids not yet found in S.id
  • A deletion from/update to S (Actor) that �removes ids that exist in R.id

(as opposed to FK constraint)

CREATE TABLE Cast_info (

...,� FOREIGN KEY (person_id)

REFERENCES Actor (id),� ...

);

48

person_id

movie_id

1

23

3

45

id

name

1

Tom Hanks

3

Michelle Yeoh

R (Cast_info)

S�(Actor)

CHECKs (i.e., attribute-based constraints) can only enforce constraints on a single table.

49 of 50

Summary of Constraints

Constraints are relationships across data elements that the data system is required to preserve or enforce.

  • Identifiable tuples: PRIMARY KEY and UNIQUE
  • Cross-table referential integrity with FOREIGN KEY
  • Attribute-based constraints
  • (not covered) Tuple-based constraints
  • (not covered) Other SQL constraints

49

Just for your info:

  • Constraints across multiple attributes in a tuple
  • CONSTRAINT <consname> CHECK <cond>
  • Listed at the end of the list of attributes
  • PostgreSQL documentation

Just for your info:

  • CREATE ASSERTION constraints across multiple relations
  • Not implemented in PostgreSQL, but see SQL92 (Documentation)

50 of 50

[Extra] Triggers

Constraints introduce overhead to a data system

  • Specifically, it’s also not clear when to check constraints, especially those that are across tables (as CHECK ASSERTIONs)
  • Worst case behavior is to check all constraints after every edit
    • Systems are better than this, but not much better

An alternative is triggers:

  • Triggers are “set off” on specific events specified by the user
  • Often less overhead than constraints

We’ll hopefully learn about triggers later!

50