Writing Results Out
January 31, 2024
1
LECTURE 04
Warmup
What type of join is the below?
Describe what the query is doing in plain English.
2
SELECT P2.name, P2.born
FROM People AS P1,
People AS P2
WHERE P1.born > P2.born
AND P1.name = 'Tom Hanks';
🤔
Warmup
What type of join is the below?
Describe what the query is doing in plain English.
3
SELECT P2.name, P2.born
FROM People AS P1,
People AS P2
WHERE P1.born > P2.born
AND P1.name = 'Tom Hanks';
Names and birth dates for people who were born before Tom Hanks
This is called a self join.
Set/Multiset operations in SQL
Set/Multiset Operations in SQL
Creating Tables, Views, and CTEs
4
Lecture 04, Data 101/Info 258 Spring 2024
Bag operations and set operations
Bag (i.e., multiset) operations for union, difference, and intersection, respectively:
5
Example: Find members of Crew that are not in People:
(SELECT person_id FROM Crew)
EXCEPT ALL
(SELECT person_id FROM People)
A subquery is a parenthesized SQL query. More in a bit!
Bag operations and set operations
Bag (i.e., multiset) operations for union, difference, and intersection, respectively:
Enforce set behavior:
Reminder: DISTINCT keyword removes duplicates ,which effectively requires SQL query to return a set of tuples:
SELECT DISTINCT …
6
Example: Find members of Crew that are not in People:
(SELECT person_id FROM Crew)
EXCEPT ALL
(SELECT person_id FROM People)
A subquery is a parenthesized SQL query. More in a bit!
Today: The Stanford Open Policing Dataset
7
Today: The Stanford Open Policing Dataset
Partnership between Computational Journalism lab and Computational Policy lab
8
Today: The Stanford Open Policing Dataset
Partnership between Computational Journalism lab and Computational Policy lab
Statistical analysis on ~100 million traffic stops from sufficiently detailed data from 21 state patrol agencies, 29 municipal police departments
9
We work with a subset of this data for today’s lecture.
Also our collaborator! (Cheryl Philips)
Collaboration between Berkeley (Data Science) + Journalists + Public Defenders
Making sense of messy police misconduct data now becoming available!
10
Also our collaborator! (Cheryl Philips)
Collaboration between Berkeley (Data Science) + Journalists + Public Defenders
Making sense of messy police misconduct data now becoming available!
“ …developing new methodologies in AI, databases, human-computer interaction, and visualization to enable the work. They now routinely meet with data and investigative journalists to sort through the daunting technical challenges. Some of their PhD students worked with investigative reporting students on clustering algorithms, for example, to better identify themes in the policing documents… “
11
Creating Tables, Views, and CTEs
Set/Multiset Operations in SQL
Creating Tables, Views, and CTEs
12
Lecture 04, Data 101/Info 258 Spring 2024
[Summary] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
13
[1/4] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
14
CREATE TABLE CitationStops AS (
SELECT race, citation
FROM Stops
WHERE citation = True
);
note the parentheses!
no semicolon here!
So, we can create a new table if we do not expect much change to the data in the table.
[2/4] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
15
Think of a view as a variable or as a virtual relation that is more convenient to query than the base table(s).
CREATE VIEW CitationStops AS (
SELECT race, citation
FROM Stops
WHERE citation = True
);
[4/4] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
16
You’ll likely use this quite a bit for assigning a subquery result to a temporary variable.
Note scoping rules apply!
WITH CitationStops AS (
SELECT race, citation
FROM Stops
WHERE citation = True
)
SELECT *
FROM CitationStops;
[4/4] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
17
Unlike SQL Server, Oracle, and BigQuery, PostgreSQL does not keep MVs up to date. Instead, the user (and not the system) must manually “refresh” MVs.
CREATE MATERIALIZED VIEW CitationStops AS (
SELECT race, citation
FROM Stops
WHERE citation = True
);
[Summary] How do I save the results of a query?
Sometimes we would like to save the results of a query to use in other queries.
There are four common strategies:
18
In this class, we will most commonly utilize views and CTEs. You will see some examples on Project 1.
Meta-commands: