2 of 38

It’s not sufficient to query…

We also need to be able to modify the data in relations
A modification command doesn’t return the result in the same way a query does, but it changes the relational instance

Three kinds of modifications:

Insert a tuple or tuples
Delete an existing tuple (or tuples)
Update the value(s) of an existing tuple (or tuples)

3 of 38

Inserting One or More Tuples

INSERT INTO Relation VALUES (<list of values>);

Example:

Stops (id, race, age, location)
INSERT INTO Stops VALUES (5000, ‘Asian’, 24, ‘West Oakland’);
INSERT INTO Stops VALUES

(5001, ‘Hispanic’, 37, ‘Rockridge'),

(5002, ‘White’, 68, ‘MacArthur');

4 of 38

Specifying Attributes in Insert

INSERT INTO Stops VALUES (5000, ‘Asian’, 24, ‘West Oakland’);

The previous syntax requires you to remember order of attributes

Can be more explicit:

INSERT INTO Stops (id, race, age, location) VALUES

(5000, ‘Asian’, 24, ‘West Oakland’);

Another benefit beyond remembering the order of attributes:

We don’t have values for all attributes, and we want the system to fill in NULL values for the rest

5 of 38

Inserting Many Tuples

We may insert the entire result of a query into a relation

INSERT INTO Relation (<Subquery>);

Stops (id, race, age, location) and Zips (location, zips)

INSERT INTO Zips(location) VALUES (

SELECT DISTINCT S.location

FROM Stops S

WHERE NOT EXISTS

(SELECT * FROM Zips Z WHERE S.location = Z.location));

Q: What do you think this query does?

6 of 38

Deleting Tuples

To delete tuples satisfying a condition from some relation:

DELETE FROM Relation WHERE <condition>;

Stops (id, race, age, location)

Remove all juvenile stops

DELETE FROM Stops WHERE age <18;

To delete all tuples from a relation simply say:

DELETE FROM Stops;

7 of 38

Updates

To change certain attributes of certain tuples in a relation:

UPDATE Relation

SET <list of attribute assignments>

WHERE <condition>

8 of 38

Example: Update

Stops (id, race, age, location)

Set NULL ages to be 18:

UPDATE Stops

SET age = 18

WHERE age IS NULL;

9 of 38

A Tricky Issue of Semantics

Now, we know at a high level how INSERTS, DELETES, and UPDATES work
There’s a special issue that impacts both DELETEs and UPDATEs

Suppose we have a relation Employee (name, salary, address, title)
We run the following update to increase salaries of all database designers who have a salary < 100k by 20%

UPDATE Employee SET salary = salary * 1.2

WHERE title= ‘Database Designer’ AND salary <100000

10 of 38

A Tricky Issue of Semantics

UPDATE Employee SET salary = salary * 1.2

WHERE title= ‘Database Designer’ AND salary <100000

Suppose the Employee relation has a tuple corresponding to a Database Designer “Bob” with salary <100,000, say $50,000
When the UPDATE statement hits Bob’s tuple, it changes that salary to 120% of the original one, $60,000.
If we’re not careful, we could keep increasing Bob’s salary…
Q: when will we stop?
We will stop when salary hits >= $100,000.

11 of 38

A Tricky Issue of Semantics

UPDATE Employee SET Salary = Salary * 1.2

WHERE Title= ‘Database Designer’ AND Salary <100000

How do we deal with this issue?

We do UPDATE (and DELETE) in two phases:

Phase 1:

We look each tuple and evaluate their WHERE condition
If the tuple matches the WHERE condition, we mark it for an update/delete

Phase 2:

We actually update/delete all the marked tuples

Convince yourselves that this always returns the same answer independent of the order of processing of tuples!

12 of 38

Data Engineering

Relational Design

13 of 38

So far…

We considered

RA: a core set of primitives, supported across multiple data systems: databases, Spark (DF API), dataframes
SQL queries on one more more relations that builds on/extends RA
SQL updates that allow you to populate and manipulate the data in your relations

Next, we’ll consider how to set up the relations so that you can then query them effectively

14 of 38

Defining a Relation

Simplest form of adding a relation is:

CREATE TABLE <name> (

);

Recall that we created a table based on a subquery previously; this allows us to create an unrelated table with an entirely new schema

And removing a relation is:

DROP TABLE <name>;

15 of 38

List of Attributes and Domains

For each attribute in the relation we list the corresponding domain
Common types include:

INTEGER
REAL
CHAR (n) = fixed length string of n characters
VARCHAR (n) = variable length string of up to n characters
TEXT = variable length string
DATE = ‘yyyy-mm-dd’
TIME = ‘hr:min:sec’
TIMESTAMP = ‘yyyy-mm-dd hr:min:sec’
BOOLEAN

16 of 38

Example

Create a table called Zips

CREATE TABLE Zips (

location VARCHAR(20),

zipcode INTEGER

);

17 of 38

The Notion of Keys

It is very useful to be able to identify tuples in a relation
Also true in the real world:

CalnetID or email address to identify students
SSN or passport number to identify people

A set of attributes is said to form a key if no two tuples can have the same values for that combination of attributes

We do so using the PRIMARY KEY or UNIQUE keyword

(distinction later)

18 of 38

Example Primary Key Usage

CREATE TABLE Stops(

stopID INTEGER, race VARCHAR(10), location VARCHAR(20),

age INTEGER, arrest BOOLEAN,

PRIMARY KEY (stopID)

);

CREATE TABLE Zips (

location VARCHAR(20),zipcode INTEGER, area REAL,

PRIMARY KEY (location, zipcode)

);

Q: How do we interpret this?

A: combo of location and zipcode is unique across Zips tuples

19 of 38

UNIQUE vs. PRIMARY KEY

Difference 1:

There can be only one PRIMARY KEY (hence the name!)
There can be many UNIQUEs – aka keys

Difference 2:

No attribute of a PRIMARY KEY can be NULL
Attributes within UNIQUEs can be NULL, and there can be several tuples with NULL values for UNIQUE attributes

So NULL is excluded from enforcement of UNIQUE

20 of 38

Example Unique Usage

CREATE TABLE Stops(

stopID INTEGER, personID INTEGER, stopTime TIMESTAMP,

race VARCHAR(10), location VARCHAR(20),

age INTEGER, arrest BOOLEAN,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)

);

21 of 38

Other Declarations

Two other declarations we can make for attributes:

NOT NULL means that the value for the attribute can never be NULL

Any changes that violate this declaration will be rejected

DEFAULT <value> means that if there is no specific value known for this attribute for a given tuple, use the stated <value> (as opposed to NULL)

22 of 38

Example Usage of Declarations

CREATE TABLE Stops(

stopID INTEGER, personID INTEGER, stopTime TIMESTAMP,

race VARCHAR(10), location VARCHAR(20) NOT NULL,

age INTEGER, arrest BOOLEAN DEFAULT False,

PRIMARY KEY (stopID),

UNIQUE (personID, stopTime)

);

23 of 38

Adding Attributes

We can add new columns to a relation by using the following:

ALTER TABLE <name>

ADD <attribute declaration>,

ADD <attribute declaration>, …;

ALTER TABLE Zips

ADD area REAL,

ADD timezone VARCHAR(5);

24 of 38

Deleting Attributes

We can delete columns from a relation by using the following:

ALTER TABLE <name>

DROP <attribute>,

DROP <attribute>, …;

ALTER TABLE Zips

DROP area,

DROP timezone;

25 of 38

Recap

So we’ve talked about how to set up and modify a relational schema
We’ve also seen a few examples of constraints

PRIMARY KEY, UNIQUE
NOT NULL

More generally, constraints are relationships across data elements that the data system is required to preserve or enforce
Next, a broader class of constraints

26 of 38

Kinds of Constraints

Key (via PRIMARY KEY or UNIQUE)
NON NULL constraints

Cross-table referential integrity or FOREIGN KEY constraints
Attribute based constraints
Tuple-based constraints but we won’t cover them
Other constraints (any SQL expression) but we won’t cover them

27 of 38

Example

Cast_info (person_id, movie_id),
Actor (id, name),
Movie (id, title)

We might expect that person_id in Cast_info must be present in the ids in Actor, and likewise, movie_id must be present in the ids in Movie

Such a constraint is called a foreign key constraint
Think of this as a pointer from a tuple in one table to a tuple in another table
Referenced attributes must be declared UNIQUE or PRIMARY KEY (i.e., a key), why?

person_id	movie_id
1	23
3	45

id	name
1	Ingrid Bergman
3	Samuel L Jackson

id	title
23	Casablanca
45	Snakes on a plane

28 of 38

Declaring a FK Constraint

Using REFERENCES

CREATE TABLE Actor (id INTEGER, name TEXT, PRIMARY KEY actor_id);

CREATE TABLE Cast_info (person_id INTEGER, movie_id INTEGER,

FOREIGN KEY (person_id) REFERENCES Actor (id),

FOREIGN KEY (movie_id) REFERENCES Movie (id));

CREATE TABLE Cast_info (

person_id INTEGER REFERENCES Actor (id),

movie_id INTEGER REFERENCES Movie (id));

Less verbose but doesn’t support multi-attribute FK constraint

person_id	movie_id
1	23
3	45

id	name
1	Ingrid Bergman
3	Samuel L Jackson

id	title
23	Casablanca
45	Snakes on a plane

29 of 38

Enforcing Foreign Key Constraints

If there is a FK constraint from attributes of R to the keys of S, two violations are possible:

An insert/update to R introduces values not found in S
A deletion from/update to S causes a tuple in R to “dangle”

Q: Why are other cases not important?

person_id	movie_id
1	23
3	45

id	name
1	Ingrid Bergman
3	Samuel L Jackson

id	title
23	Casablanca
45	Snakes on a plane

30 of 38

Enforcing Foreign Key Constraints

If there is a FK constraint from attributes of R to the keys of S, two violations are possible:

An insert/update to R introduces values not found in S
A deletion from/update to S causes a tuple in R to “dangle”

Q: Why are other cases not important?

A deletion from R or insertion to S introduces no new potential violations
Likewise for updates to R/S that don’t introduce new values or cause tuples to dangle

31 of 38

Enforcing Foreign Key Constraints

Here, R: Cast_info, S: Actor
Referencing relation changes:

An insert or update to Cast_info that introduces a non-existent actor will be rejected

Referenced relation changes:

A deletion or update to Actor that causes some tuples in Cast_info to “dangle” can be handled in three ways:

Default: reject the modification
CASCADE: make the same change to Cast_info as in Actor

Deleted Actor: delete the Cast_info tuples that refer to it
Updated Actor: update the Cast_info tuples that refer to it

SET NULL: change the person_id in Cast_info be NULL

person_id	movie_id
1	23
3	45
3	76

id	name
1	Ingrid Bergman
3	Samuel L Jackson

32 of 38

Example

CREATE TABLE Cast_info (person_id INTEGER, movie_id INTEGER,

FOREIGN KEY (person_id) REFERENCES Actor (id)

ON DELETE SET NULL ON UPDATE CASCADE,

FOREIGN KEY (movie_id) REFERENCES Movie (id)

ON DELETE SET NULL);

Q: what if we

Delete a tuple from Actor corresponding to id = 1?
Change a tuple in Actor from id = 1 to id = 2?
Delete a tuple from Movie corresponding to id = 23?
Change a tuple in Movie from id = 23 to id = 24?
Insert a tuple into Cast_info that adds a new person_id not found in Actor?

person_id	movie_id
1	23
3	45

id	name
1	Ingrid Bergman
3	Samuel L Jackson

id	title
23	Casablanca
45	Snakes on a plane

33 of 38

Attribute Based Constraints

NOT NULL is one such constraint
Example: ensure that actor_ids are four-digit integers.

CREATE TABLE Actor (

id INTEGER CHECK (id >= 1000 AND id <= 9999),

name TEXT);

Checked on changes to Actor

Either when a new id is added or an existing one is updated, not when deleted (why?)

For this reason, we can’t use CHECKs to enforce referential integrity

Even if we use within Cast_info

…, person_id INTEGER CHECK

(EXISTS (SELECT * FROM Actor A WHERE person_id = A.id)), …

We will miss changes to the referenced relation (in this case Actor)

34 of 38

Kinds of Constraints

Key (via PRIMARY KEY or UNIQUE)
Referential integrity constraints as FOREIGN KEY constraints
Attribute based constraints

Tuple-based constraints

Constraints across multiple attributes in a tuple
but we won’t cover them (still via CHECK)
Listed at the end of the list of attributes

Other constraints (any SQL expression)

Constraints across multiple relations (via CREATE ASSERTION)
but we won’t cover them

35 of 38

Why Constraints?

Helps encode domain knowledge that the system can enforce

E.g., age is between 0->120, zipcode is five digits

May appear to be a specification and performance (more on that next) burden for users, but acts as guardrails and provides safety
Prevents unintended downstream errors

E.g., when new inserts violate the constraints

36 of 38

Triggers, Briefly

Constraints introduce overhead to a data system

Specifically, it’s also not clear when to check constraints, especially those that are across tables (as CHECK ASSERTIONs)
Worst case behavior is to check all constraints after every edit

Systems are better than this, but not much better

An alternative is triggers

Triggers are “set off” on specific events specified by the user
Often less overhead than constraints
We’ll hopefully learn about triggers later!

37 of 38

Quick Demo

Insert into the Actor relation

INSERT INTO Actor VALUES (5000, 'Adam', 'Driver', 'Feb 3, 2020’);

INSERT INTO Actor VALUES (5001, 'Brenda', 'Driver', 'Feb 3, 2020'), (5002, 'Charles', 'Driver', 'Feb 3, 2020’);

INSERT INTO Actor (actor_id, first_name, last_name, last_update) VALUES (5003, 'Didi', 'Driver', 'Feb 3, 2020');

Updates to the Actor relation

UPDATE Actor

SET actor_id = 6000

WHERE first_name = ‘Adam’ AND last_name = ‘Driver’

Deletes from the Actor relation

DELETE FROM Actor WHERE last_name = ‘Driver';

38 of 38

Quick Demo

Creating the manager relation

CREATE TABLE manager (manager_name VARCHAR(20), age INTEGER DEFAULT 50, manager_id CHAR(4), PRIMARY KEY (manager_id), UNIQUE (manager_name));

Adding a tuple (Success!)

INSERT INTO Manager (age, manager_id)

VALUES (23, ‘sd45’);

Adding a tuple (Failure)

INSERT INTO Manager (manager_name, age)

VALUES ('John Smith', 123);

Changing the schema by adding attribs

ALTER TABLE Manager

ADD address VARCHAR (20),

ADD income REAL DEFAULT 10000.0;

Changing the schema by dropping attribs

ALTER TABLE Manager DROP address, DROP income;

Deleting the relation entirely

DROP TABLE Manager;