1 of 52

SQL Review

August 29, 2023

Data 101, Fall 2023 @ UC Berkeley

Lisa Yan https://fa23.data101.org/

1

LECTURE 02

2 of 52

Join at slido.com�#3737223

Click Present with Slido or install our Chrome extension to display joining instructions for participants while presenting.

3 of 52

PostgreSQL

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

3

Lecture 02, Data 101 Fall 2023

4 of 52

Reminder: SQL: Structured Query Language (or “sequel”)

High level data transformation language

  • Supported by relational databases and other data systems (e.g., Spark).

Declarative rather than imperative

  • Describe what rather how.
  • Users issue SQL queries (Recall query-centric vs. code-centric).

The core functionality of SQL is restricted in comparison to, say, Python.

  • By design—the system can then automatically optimize the queries.
  • Still quite powerful: SQL can do most things you want to do with data.
  • …and extensions allow us to bridge the gap.

4

3737223

5 of 52

PostgreSQL: General Overview

PostgreSQL is a free, open-source database management system (DBMS) that is SQL-compliant (Note: SQL is an ISO standard [Oracle blog post]).

5

postgreSQL client

  • psql: command-line interface (CLI)
  • jupysql: Python package that supports postgres queries

titles

akas

postgreSQL server

postgresql://jovyan@127.0.0.1:5432/postgres

episodes

ratings

3737223

6 of 52

A bit of PostgreSQL history

PostgreSQL is an object-relational database management system (ORDBMS) based on POSTGRES, Version 4.2, developed at the University of California at Berkeley Computer Science Department. POSTGRES pioneered many concepts that only became available in some commercial database systems much later.

6

Jean-Auguste-Dominique Ingres, Napoléon Ier sur le trône impérial. Oil on canvas. 1806. [wikipedia]

  • 1974: INGRES (INteractive GRaphics Retrieval System) led by Professors Michael Stonebraker and Eugene Wong
  • 1986: The POSTGRES project, with custom data types and object-oriented data model
  • 1994: psql for interactive SQL queries
  • 1995: PostgreSQL 6.0, colloquially known as “postgres”

Michael Stonebraker, Professor Emeritus and Turing Award winner

Eugene Wong, Professor Emeritus, government technology advisory committee service in U.S., Hong Kong, and Taiwan

PostgreSQL has fantastic documentation that we will refer to throughout this semester:

https://www.postgresql.org/docs/current/index.html

3737223

7 of 52

Relation = Schema + Instance

Schema: The structure, format, or scaffolding defining a relation.

  • Relation name, attribute names, attribute domains
  • (“relational schema”: Schemas for many relations)

A relation is a collection of tuples (records) with a predefined collection of attributes.

  • Each attribute has a type, called domain.
  • Each domain is atomic (“non-decomposable by DBMS”): string, integer, object, etc.
  • Each tuple in the relation has values for each attribute.

Instance: Specific instantiation of the Relation

  • A Relation with “values filled in”

7

Schema = one type of metadata; Instance = data. More on this later!

Songs (name String, artist � String, album String,� peak Integer, …)

name

artist

album

peak

22

Taylor Swift

Red

20

Blinding Lights

The Weeknd

After Hours

1

Truly Madly Deeply

Savage Garden

Savage Garden

1

Shelter

Porter Robinson

Shelter (single)

16

Attribute/ Column

Tuple/Row/

Record

3737223

8 of 52

Today: Exploring the Internet Movie Database (IMDb)

This is a real subset of the IMDB dataset!

  • As messy as it comes
  • Six tables (i.e., relations)

Overall schema: https://www.imdb.com/interfaces/

Let’s explore this data using psql, the PostgreSQL interactive terminal program! (psql official documentation)

  • psql postgresql://127.0.0.1:5432/imdb_lecture
  • \d list all relations in this database
  • \d tablename list schema of tablename relation
  • \q quit psql
  • \? help
  • <ctrl>-c cancel

8

You can also test out queries in interactive mode. But all your submittable work should be in Jupyter Notebooks!

Note: If you are trying this demo by yourself, make sure to run the first few cells in the notebook to load in the imdb database into the postgres server. Then launch psql.

Demo

3737223

9 of 52

SELECT FROM WHERE

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

9

Lecture 02, Data 101 Fall 2023

10 of 52

The Data 100 cheatsheet

SELECT [DISTINCT] <column expression list>

FROM <table>

[WHERE <predicate>]

[GROUP BY <column list>]

[HAVING <predicate>]

[ORDER BY <column list>]

[LIMIT <number of rows>]

[OFFSET <number of rows>];

10

  • By convention, use all caps for keywords in SQL statements.
  • Use newlines to make SQL code more readable.
  • AS keyword: rename columns during selection process.
  • WHERE: rows; HAVING: groups. WHERE precedes HAVING.
  • Column Expressions may include aggregation functions (MAX, MIN, etc.) and DISTINCT.

Summary So Far

11 of 52

The Data 100 cheatsheet

SELECT [DISTINCT] <column expression list>

FROM <table>

[WHERE <predicate>]

[GROUP BY <column list>]

[HAVING <predicate>]

[ORDER BY <column list>]

[LIMIT <number of rows>]

[OFFSET <number of rows>];

11

  • By convention, use all caps for keywords in SQL statements.
  • Use newlines to make SQL code more readable.
  • AS keyword: rename columns during selection process.
  • WHERE: rows; HAVING: groups. WHERE precedes HAVING.
  • Column Expressions may include aggregation functions (MAX, MIN, etc.) and DISTINCT.

Over the next two weeks, we will explore SQL queries in depth.

Our goal is not to reteach the keywords you learned from Data 100/SQLite3, but instead to provide you vocabulary and paradigms/abstractions that explain each query component.

Our second goal is to teach you some postgres!

Today:

(review) SQL queries,

(new) Relational Algebra Part I

Thursday:

(new) Relational Algebra Part II

Next Tuesday:

(review) Grouping,

(new) subqueries

Next Thursday:

(new) Window functions

Summary So Far

12 of 52

SFW Queries

A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.

To read a SFW SQL query:

  1. FROM clause: lists the table(s).

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

12

(We’ll refine this overview slide in a bit.)

Convention:

  • SQL keywords all caps
  • Attributes lowercase
  • end with semicolon

NB: keywords are technically case-insensitive.

SELECT *

FROM titles

WHERE premiered = 2023;

SELECT premiered, genres

FROM titles;

3737223

13 of 52

More on SELECT, WHERE

SELECT can employ:

  • Renamed attributes with AS
  • Expressions
  • Case statements
  • Many more functions (string, date-time…)

WHERE

  • Logical operators (AND, OR, NOT)
  • Comparison operators (=, >=, <> i.e. !=, etc.)

13

SELECT primary_title, type,

premiered AS release_year,

runtime_minutes,

runtime_minutes/60 AS � runtime_hours�FROM titles

WHERE premiered >= 2020 AND� premiered <= 2023;

⚠️ runtime_hours doesn’t handle fractions!

Demo

3737223

14 of 52

CAST and Data Types

CAST: Convert one type to another for calculation.

  • If premiered was a string, we can cast to integer:�CAST(premiered AS INTEGER)

Postgres Data Types: Documentation Ch. 15

  • Types include TEXT, INTEGER, CHAR, VARCHAR, DATETIME, REAL, DOUBLE PRECISION, …

14

SELECT primary_title, type,

premiered AS release_year,

runtime_minutes,

CAST(runtime_minutes AS � DOUBLE PRECISION)/60 AS � runtime_hours�FROM titles

WHERE premiered >= 2020 AND� premiered <= 2023;

Demo

3737223

15 of 52

[Tip] Debugging

Postgres will return the tuples in an unspecified but deterministic order. Sometimes, seeing the first few rows aren’t enough to know whether your query is executing as expected.

To see 10 random rows from the result query:

SELECT <attributes>

FROM <tables>

WHERE <condition about tuples in tables>

ORDER BY RANDOM()

LIMIT 10;

15

(We’ll return to other notions of randomness in a few lectures.)

Return only the first 10 rows.

Randomly shuffle the rows. (How? See RANDOM() docs)

Demo

3737223

16 of 52

Let’s Kick It Up a Notch [reference]

Consider the people table.

How do we compute the age of all people? Define age as:

  • If they are still alive in 2023, then their current age.
  • Otherwise, the age at which they passed.

16

Which expression(s) should we use in the blank? Select all that may apply.

  1. CAST(_____ AS REAL)
  2. died - born
  3. born - died
  4. 2023 - born
  5. CASE� WHEN ____ THEN ____ � ELSE ____�END
  6. IS NULL
  7. Something else

🤔

SELECT

person_id, name, � died, born,

______ AS age

FROM people;

Which expression(s) should we use in the blank? Select all that may apply.

Exercise

3737223

17 of 52

Which expression(s) should we use in the blank? Select all that may apply.

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

18 of 52

Let’s Kick It Up a Notch Solution

Solution

Consider the people table.

How do we compute the age of all people? Define age as:

  • If they are still alive in 2023, then their current age.
  • Otherwise, the age at which they passed.

18

SELECT

person_id, name,

died, born,

CASE WHEN died IS NULL

THEN 2023 - born

ELSE died - born

END AS age

FROM people;

Exercise

3737223

19 of 52

Null values are not like other values

Tuples can have “NULL” values for some attributes. Common reasons:

  • missing (exists but don’t know what it is), or
  • inapplicable (value doesn’t apply: occupation for a child).
  • unknown (the default value if not declared explicitly)

19

3737223

20 of 52

Null values are not like other values

Tuples can have “NULL” values for some attributes. Common reasons:

  • missing (exists but don’t know what it is), or
  • inapplicable (value doesn’t apply: occupation for a child).
  • unknown (the default value if not declared explicitly)

Recall Data 100: We need to be careful about how we drop/impute NULLs!

  • Operations: NULL op operandNULL
  • “Three-valued logic,” i.e., NULL does not satisfy boolean conditions (documentation).
    • If length is NULL:
      • length > 100FALSE
      • length <= 100FALSE

20

3737223

21 of 52

Null values are not like other values

Tuples can have “NULL” values for some attributes. Common reasons:

  • missing (exists but don’t know what it is), or
  • inapplicable (value doesn’t apply: occupation for a child).
  • unknown (the default value if not declared explicitly)

Recall Data 100: We need to be careful about how we drop/impute NULLs!

  • Operations: NULL op operandNULL
  • “Three-valued logic,” i.e., NULL does not satisfy boolean conditions (documentation).
    • If length is NULL:
      • length > 100FALSE
      • length <= 100FALSE

21

(We’ll return to NULL when we talk about aggregation.)

SELECT born

FROM people;

SELECT born

FROM people�WHERE born < 2023 OR

born IS NULL;

SELECT born

FROM people�WHERE born < 2023;

equivalent

not equivalent

3737223

22 of 52

String matching and COUNT(*)

Small primer on boolean expressions with strings:

Use LIKE or NOT LIKE:

  • % matches zero or more characters; and
  • _ matches a single char

Chapter 9.7 documentation: Pattern Matching

22

(More on string matching and aggregations next week!�Spoiler: Regular expressions return)

SELECT COUNT(*)

FROM people

WHERE name LIKE 'Chris%';

COUNT is an aggregation function. Here, it counts the number of tuples in the result.

'abc' LIKE 'abc' true

'abc' LIKE 'a%' true

'abc' LIKE '_b_' true

'abc' LIKE 'c' false

3737223

23 of 52

Multi-Relation Queries

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

23

Lecture 02, Data 101 Fall 2023

24 of 52

SFW Queries with Multiple Relations

Update to the SELECT FROM WHERE (SFW) query:

To read a SFW SQL query:

  • FROM clause: lists the table(s).

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

24

SELECT *

FROM

akas, titles

WHERE

titles.title_id = � akas.title_id;

  • Take the cross-product of tables listed
  • Refer to attributes as <relation>.<attribute> if needed.
  1. cross-product (“cross join”): list all pairs of tuples from each relation
  2. then, eliminate tuples that don’t pass the WHERE filter

3737223

25 of 52

Omitting AS will still rename attributes

Update to the SELECT FROM WHERE (SFW) query:

To read a SFW SQL query:

  • FROM clause: lists the table(s).

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

25

SELECT *

FROM

akas, titles

WHERE

titles.title_id =

akas.title_id;

SELECT *

FROM

akas AS A,� titles T

WHERE

A.title_id = T.title_id;

  • Take the cross-product of tables listed
  • Refer to attributes as <relation>.<attribute> if needed.

Syntax short cut with AS: omitting the AS keyword is allow and will still rename!

3737223

26 of 52

FROM multiple relations + WHERE = Inner Join

Recall the equivalent inner join:

(Likely no performance difference; see this StackOverflow response.)

Cross join with WHERE expression

26

More next week! For now, see discussion handout to briefly review left join, right join, and outer join.

SELECT *

FROM

akas A,

titles T

WHERE

A.title_id = T.title_id

SELECT *

FROM akas A

INNER JOIN titles T

ON A.title_id = T.title_id

3737223

27 of 52

IMDb Exercise

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

27

Lecture 02, Data 101 Fall 2023

28 of 52

Michelle Yeoh

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

28

Michelle Yeoh (Malaysia)

https://www.imdb.com/name/nm0000706/

3737223

29 of 52

Now your turn!

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

29

🤔

Strategy point 1: Look at the schemas. Which tables have the information that we need?�(Hint: use \d meta-command in psql)

Which tables should we join together? Select all that apply.

A. akas

B. crew

C. episodes

D. people

E. ratings

F. titles

Exercise

3737223

30 of 52

Which tables should we join together? Select all that apply.

Click Present with Slido or install our Chrome extension to activate this poll while presenting.

31 of 52

Now your turn!

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

31

Strategy point 1: Look at the schemas. Which tables have the information that we need? (Hint: use \d meta-command in psql)

Strategy point 2: Think about joins as modifying records:

  • extending each record to include more attributes, or
  • expanding each table to include more records and attributes.
    • Doing so often changes what a record represents.

Exercise

3737223

32 of 52

Strategy Point 2 in action

32

SELECT *

FROM titles

INNER JOIN crew

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id;

Strategy point 2: Think about joins as modifying records:

  • extending each record to include more attributes, or
  • expanding each table to include more records and attributes.
    • Doing so often changes what a record represents.

Row per movie

Row per crew member in a movie, with movie title, etc.

Row per crew member in a movie, with movie title, person name, etc.

expand

extend

Exercise

3737223

33 of 52

Now your turn!

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

33

Strategy point 1: Look at the schemas. Which tables have the information that we need? (Hint: use \d meta-command in psql)

Strategy point 2: Think about joins as modifying records:

  • extending each record to include more attributes, or
  • expanding each table to include more records and attributes.
    • Doing so often changes what a record represents.

Strategy point 3: Think about WHERE, first, then SELECT.

Exercise

3737223

34 of 52

Strategy Point 3 in action

34

SELECT *

FROM titles

INNER JOIN crew

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

Row per crew member in a movie, with movie title, person name, etc.

Keep the rows with Michelle Yeoh as a crew member.

Exercise

3737223

35 of 52

Strategy Point 3 in action

35

SELECT DISTINCT titles.primary_title,

titles.title_id�

FROM titles

INNER JOIN crew

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

If Michelle Yeoh played multiple roles, there would be duplicates, so remove with DISTINCT.

Keep the rows with Michelle Yeoh as a crew member.

Got to here. Will cover the rest in Lecture 03

Exercise

3737223

36 of 52

[Bonus] IMDb Exercise 2

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

36

Lecture 02, Data 101 Fall 2023

37 of 52

Extending the Query

How do we write a query that gets the names of Michelle Yeoh movies�that have a rating of at least 7.0?

  1. First, create a view�(i.e., a temporary relation)�of the previous query,�called yeoh_movies.

37

CREATE VIEW yeoh_movies AS (

SELECT DISTINCT

titles.primary_title,

titles.title_id

FROM titles

INNER JOIN crew

ON crew.title_id = titles.title_id

INNER JOIN people

ON people.person_id = crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie'

);

3737223

38 of 52

Extending the Query

How do we write a query that gets the names of Michelle Yeoh movies�that have a rating of at least 7.0?

  • First, create a view�(i.e., a temporary relation)�of the previous query,�called yeoh_movies.�
  • Then, join.

38

SELECT primary_title

FROM ratings

INNER JOIN yeoh_movies

ON ratings.title_id = yeoh_movies.title_id

WHERE rating >= 7.0;

3737223

39 of 52

Quick Peek: The Natural Join

39

SELECT *

FROM ratings

INNER JOIN yeoh_movies

ON ratings.title_id =

yeoh_movies.title_id

WHERE rating >= 7.0;

title_id | rating | votes | primary_title | title_id

-----------+--------+--------+--------------------------------+-----------

tt0190332 | 7.9 | 268227 | Crouching Tiger, Hidden Dragon | tt0190332

tt0397535 | 7.4 | 132457 | Memoirs of a Geisha | tt0397535

tt0190332 | 7.8 | 241690 | Crouching Tiger, Hidden Dragon | tt0190332

(3 rows)

SELECT *

FROM ratings

NATURAL JOIN yeoh_movies

WHERE rating >= 7.0;

title_id | rating | votes | primary_title

-----------+--------+--------+--------------------------------

tt0190332 | 7.9 | 268227 | Crouching Tiger, Hidden Dragon

tt0397535 | 7.4 | 132457 | Memoirs of a Geisha

tt0190332 | 7.8 | 241690 | Crouching Tiger, Hidden Dragon

(3 rows)

Natural Join “naturally” joins on same attributes and removes duplicates!

More next time!

3737223

40 of 52

[the rest is Lecture 03 material]

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

40

Lecture 02, Data 101 Fall 2023

41 of 52

Now your turn!

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

41

Strategy point 1: Look at the schemas. Which tables have the information that we need? (Hint: use \d meta-command in psql)

Strategy point 2: Think about joins as modifying records:

  • extending each record to include more attributes, or
  • expanding each table to include more records and attributes.
    • Doing so often changes what a record represents.

Strategy point 4: Consider what makes a query readable based on the originally posed question.

Strategy point 3: Think about WHERE, first, then SELECT.

(Lecture 03 material)

Exercise

3737223

42 of 52

Strategy 4 in action

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

42

SELECT DISTINCT � titles.primary_title,

titles.title_id

FROM titles

INNER JOIN crew

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

Joins are commutative, so these queries are equivalent!

SELECT DISTINCT titles.primary_title,

titles.title_id

FROM crew

INNER JOIN titles

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

3737223

43 of 52

Strategy 4 in action

How do we write a query that gets the�titles and IDs of Michelle Yeoh movies?

43

SELECT DISTINCT � titles.primary_title,

titles.title_id

FROM titles

INNER JOIN crew

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

However, stylistically the left is more readable.

SELECT DISTINCT titles.primary_title,

crew.title_id

FROM crew

INNER JOIN titles

ON crew.title_id = � titles.title_id

INNER JOIN people

ON people.person_id =

crew.person_id

WHERE people.name = 'Michelle Yeoh'

AND titles.type = 'movie';

3737223

44 of 52

SFW as Relational Algebra

PostgreSQL

SELECT FROM WHERE

Multi-Relation Queries

IMDb Exercise

[Bonus] IMDb Exercise 2

SFW as Relational Algebra

44

Lecture 02, Data 101 Fall 2023

45 of 52

Preview of Next Time: Relational Data Model and Algebra

Algebra (i.e., formulaic operations on variables) is the core of many technical disciplines:

  • Elementary algebra is the foundation of arithmetic and calculus
  • Linear Algebra is the foundation of matrix manipulation, and thereby machine learning (ML)

45

RA is the core of query optimization, which we will cover later in the semester.

For now, let’s see how to use RA to represent SFW queries.

Similarly, Relational Algebra (RA) is the foundation of the relational data model.�A firm grounding in RA allows us:

  • To understand and quantify the transformational capabilities of various data systems.
  • To provide a “baseline” of functionality, and
  • To study the impact of when there are deviations from the data model and algebra.

3737223

46 of 52

Relational Schema & Instance The Relational Data Model

A relation is a collection set of tuples (records) with a predefined collection set of attributes.

  • Each attribute has a type, called domain.
  • Each domain is atomic (“simple as possible for DBMS”): string, integer, object, etc.
  • Each tuple in the relation has values for each attribute.

Schema: The structure, format, or scaffolding defining a relation.

  • Relation name, attribute names, attribute domains
  • (“overall schema” Schemas for many relations)

Instance: Specific instantiation of the Relation

  • A Relation with “values filled in”

46

Same slide as before!, But note that we are dealing with sets (more on bags later).

Songs (name String, artist � String, album String,� peak Integer, …)

name

artist

album

peak

22

Taylor Swift

Red

20

Blinding Lights

The Weeknd

After Hours

1

Truly Madly Deeply

Savage Garden

Savage Garden

1

Shelter

Porter Robinson

Shelter (single)

16

3737223

47 of 52

Set are Unordered: Equivalent Representations

Since a relation has a set of attributes and a set of tuples,�we can reorder both and not change the relation:

47

name

artist

album

peak

22

Taylor Swift

Red

20

Blinding Lights

The Weeknd

After Hours

1

Truly Madly Deeply

Savage Garden

Savage Garden

1

Shelter

Porter Robinson

Shelter (single)

16

name

peak

artist

album

Blinding Lights

1

The Weeknd

After Hours

Truly Madly Deeply

1

Savage Garden

Savage Garden

Shelter

16

Porter Robinson

Shelter (single)

22

20

Taylor Swift

Red

=

3737223

48 of 52

Relational Algebra

Relational Algebra (RA) is the theory of operations that help us transform relations.

  • In RA, relations are the operands!

Unary operations are those that transform a single relation R with schema R(B1, B2, …, Bm):

Basic unary operations:

  • Selection: Filter relation R for tuples that � match a condition C.

  • Projection: Filter relation R to only include � attributes Bi1, Bi2, …, Bin.

  • Renaming: Rename attributes Bi1, Bi2, …, Bin � to A1, A2, …, An.

48

No RA symbols for now. More next time!

Change metadata (schema)?

❌��

✅�

3737223

49 of 52

SFW as Relational Algebra

To read a SFW SQL query:

  • FROM clause: lists the table.

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

  • Rename attributes with AS.

A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.

49

SELECT Bi1 AS A1,

Bi2 AS A2,

BiN AS AN

FROM R

WHERE C;

3737223

50 of 52

SFW as Relational Algebra

To read a SFW SQL query:

  • FROM clause: lists the table.

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

  • Rename attributes with AS.

A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.

50

SELECT Bi1 AS A1,

Bi2 AS A2,

BiN AS AN

FROM R

WHERE C;

🤔

WHERE, SELECT, AS - Which RA operators do these respectively map to?

  1. WHERE: Projection, SELECT: Projection, AS: Renaming
  2. WHERE: Projection, SELECT: Selection, AS: Renaming
  3. WHERE: Selection, SELECT: Projection, AS: Renaming
  4. WHERE: Selection, SELECT: Selection, AS: Renaming
  5. Something else

3737223

51 of 52

SFW as Relational Algebra

A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.

To read a SFW SQL query:

  • FROM clause: lists the table.

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

  • Rename attributes with AS.

51

SELECT Bi1 AS A1,

Bi2 AS A2,

BiN AS AN

FROM R

WHERE C;

Relational Algebra steps:

For the Relation R,

1. Select tuples that match condition C.

2. Project to include�attributes Bi1, Bi2, …, Bin .

3. Rename attributes �Bi1, Bi2, …, Bin to A1, A2, …, An.

3737223

52 of 52

SFW as Relational Algebra

A SELECT FROM WHERE (SFW) query is the most standard SQL query to extract records from a relation.

To read a SFW SQL query:

  • FROM clause: lists the table.

  • WHERE clause: filters for the matching tuples.

  • SELECT clause: filters for the specified attributes.

  • Rename attributes with AS.

52

Relational Algebra steps:

For the Relation R,

1. Select tuples that match condition C.

2. Project to include�attributes Bi1, Bi2, …, Bin .

3. Rename attributes �Bi1, Bi2, …, Bin to A1, A2, …, An.

SELECT Bi1 AS A1,

Bi2 AS A2,

BiN AS AN

FROM R

WHERE C;

⚠️SQL SELECT is Relational Algebra projection! ⚠️ This is a decades-old mismatch of terminology. More next time!

3737223