part of http://philip.greenspun.com/teaching/three-day-rdbms/
Updated January 24, 2015
Create a single table, named "facebook_simple" that can store the above data (suggest four columns). Load in the first row (George W. Bush), either using http://localhost/phpmyadmin/ or an INSERT statement.
create table facebook_simple (
name varchar(100),
group_name varchar(100),
likes text,
friends text
);
INSERT INTO facebook_simple (
name,group_name,likes,friends
) VALUES (
'George W. Bush','Yale','starting wars, Gitmo','Pamela Anderson, Borat'
);
INSERT INTO facebook_simple (
name,group_name,likes,friends
) VALUES (
'Barack Hussein Obama II','Harvard','health insurance, unemployment','Bruno, Paula Abdul'
);
INSERT INTO facebook_simple (
name,group_name,likes,friends
) VALUES (
'Pamela Anderson','Hollywood','plastic surgery, home video','George W. Bush'
);
INSERT INTO facebook_simple (
name,group_name,likes,friends
) VALUES (
'Borat','Kazakstan','farming, travel, singing','Pamela Anderson, Obama'
);
SELECT COUNT(*)
FROM facebook_simple
WHERE name = 'George W. Bush'
AND friends LIKE '%Bruno%';
also works right after SELECT!
SELECT name, friends like '%Bruno%'
FROM facebook_simple;
-- Note use of INSTR instead of LIKE
SELECT COUNT(*)
FROM facebook_simple
WHERE name = 'George W. Bush'
AND INSTR(friends,'Borat') > 0;
Let's try this with CASE instead
SELECT name, CASE WHEN INSTR(friends,'Borat') > 0 THEN 1 ELSE 0 END as boratness
FROM facebook_simple;
-- we won't use "numbers" because it might be a reserved word in some DBMSes
-- e.g., "number" is a data type
CREATE TABLE some_numbers (
one_number int
);
INSERT INTO some_numbers VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
-- the function SUBSTRING_INDEX returns the part of the string that precedes
-- n occurrences of a delimiter.
-- See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_substring-index
SELECT
TRIM(LEADING ' ' FROM SUBSTR(SUBSTRING_INDEX(friends,',',one_number),CASE WHEN one_number = 1 THEN 1 ELSE LENGTH(SUBSTRING_INDEX(friends,',',one_number-1))+2 END)) as friend
FROM facebook_simple s
JOIN some_numbers n
WHERE name = 'George W. Bush';
SELECT name, group_name, likes, friends
FROM facebook_simple
ORDER BY SUBSTRING(name, LENGTH(SUBSTRING(REVERSE(name), LOCATE(" ", REVERSE(name))-1)));
Now that you've developed a working data model... throw it out and use ours. That way everyone in the class will be developing queries against the same tables and solutions can be compared easily.
Let's have queries for the following reports:
SELECT u.user_id, concat(u.first_names, ' ', u.last_name) as name, g.title
FROM users u
JOIN group_memberships m ON (m.user_id = u.user_id)
JOIN groups g ON (g.group_id = m.group_id)
ORDER BY UPPER(u.last_name), UPPER(u.first_names)
LIMIT 20;
the last clause tells MySQL to return only the first 20 rows (for easy display)
What if we get sick of that CONCAT to get the full name? We could define a view
CREATE VIEW users_one_name
AS
SELECT user_id, concat(first_names, ' ', last_name) as name, email, password_hash, creation_date
FROM users;
and then query from that view.
-- Let's start by generating a list of all of the unique friendship relationships.
-- We do this by using friendships as the driving table:
SELECT CONCAT(u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name
FROM friendships f
JOIN users u1 ON (u1.user_id = f.to_user_id)
JOIN users u2 ON (u2.user_id = f.from_user_id)
WHERE f.accepted IS NOT NULL
ORDER BY user_name
LIMIT 20;
There are two problems with this report. First, if we have two users with the same name, say "John Smith", we will not be able to tell them apart in the report. Solve this by adding the unique user id into the user_name column:
SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name
FROM friendships f
JOIN users u1 ON (u1.user_id = f.to_user_id)
JOIN users u2 ON (u2.user_id = f.from_user_id)
WHERE f.accepted IS NOT NULL
ORDER BY user_name
LIMIT 20;
Second, only the to->from relationships are grouped together by this query. We can see all of the friends invited by Barbara Harris grouped together, while the friends who invited her are scattered throughout the result set. To see all of a user's friendships grouped together regardless of direction, run the same query in the reverse direction and UNION
(SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name
FROM friendships f
JOIN users u1 ON (u1.user_id = f.to_user_id)
JOIN users u2 ON (u2.user_id = f.from_user_id)
WHERE f.accepted IS NOT NULL)
UNION
(SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name
FROM friendships f
JOIN users u1 ON (u1.user_id = f.from_user_id)
JOIN users u2 ON (u2.user_id = f.to_user_id)
WHERE f.accepted IS NOT NULL)
ORDER BY user_name
LIMIT 20;
Alternatively, simply add an OR clause to the WHERE. Here's Kathleen Alexander's solution:
SELECT u1.first_names, u1.last_name, u2.first_names, u2.last_name
FROM users u1, users u2, friendships fs
WHERE (u1.user_id=fs.from_user_id AND u2.user_id=fs.to_user_id)
OR (u1.user_id=fs.to_user_id AND u2.user_id=fs.from_user_id)
ORDER BY u1.last_name, u1.first_names
SELECT CONCAT(u.first_names, ' ', u.last_name) AS name
FROM user_like_map m
JOIN likes l ON (m.like_id = l.like_id)
JOIN users u ON (u.user_id = m.user_id) WHERE l.topic = 'Citizen Kane'
ORDER BY UPPER(name)
LIMIT 20;
How about a report showing all of the groups in the system and their associated members? It seems simple, but what about groups that don't have any members yet? We want them in our report.
-- Not all groups in the test data have members **
-- So let's insert a group with none
INSERT INTO groups (title) VALUES ('Hogwarts');
SELECT g.title, CONCAT(u.first_names, ' ', u.last_name) AS name
FROM groups g
LEFT OUTER JOIN group_memberships gm ON (gm.group_id = g.group_id)
LEFT OUTER JOIN users u ON (u.user_id = gm.user_id)
LIMIT 20;
-- The second LEFT OUTER JOIN is required because groups with no
-- members will have a NULL value for gm.user_id.
-- Remove the LIMIT to check if the group having no users comes through in the result, just be ready for a lot of output! (Did you do the INSERT first?)
Please write a query giving a report of all of the names of the groups in the system that have at least one member next to the number of members in that group. You'll be using GROUP BY and probably count(*).
SELECT g.title, COUNT(*) AS member_count
FROM groups g
JOIN group_memberships gm ON (gm.group_id = g.group_id)
GROUP BY g.group_id;
Write a second query that also includes the names of groups with no members.
SELECT g.title, COUNT(gm.group_id) as member_count
FROM groups g
LEFT OUTER JOIN group_memberships gm ON (gm.group_id = g.group_id)
GROUP BY g.group_id;
-- Bonus question: Why COUNT(gm.group_id) and not COUNT(*)?
SELECT l.topic, COUNT(*) AS num_likers
FROM likes l
JOIN user_like_map m ON (m.like_id = l.like_id)
GROUP BY l.like_id
ORDER BY COUNT(*) DESC
LIMIT 1;
Extra credit: What if there are two or three "likes", each with the same number of users mapped to it? Can you generate a report showing all of the likes whose popularity is at the mode? You might need a subquery and/or a view.
The LIMIT can be increased in the solution above, but leaves no way to be sure that we are extracting all of the most-liked "likes", nor a way to restrict the report to just those items. SQL does not have a means of expressing "fetch rows until the value in column 2 changes", but we can take the following approach
Maximum number of users liking a like:
SELECT COUNT(*) AS num_likers
FROM likes l
JOIN user_like_map m ON (m.like_id = l.like_id)
GROUP BY l.like_id
ORDER BY num_likers DESC
LIMIT 1;
Find all likes with that number:
SELECT l.topic, COUNT(*) AS num_likers
FROM likes l
JOIN user_like_map m ON (m.like_id = l.like_id)
GROUP BY l.like_id
HAVING num_likers =
(
SELECT COUNT(*) AS num_likers
FROM likes l
JOIN user_like_map m ON (m.like_id = l.like_id)
GROUP BY l.like_id
ORDER BY num_likers DESC
LIMIT 1
)
ORDER BY l.topic;
One of the class members used a view to simplify the query:
create view like_report as
select l.topic, count(lm.user_id) as Members
from user_like_map as lm
join likes as l on lm.like_id = l.like_id
group by lm.like_id
order by Members desc;
select topic, Members
from like_report
where Members = (select max(Members) from like_report);
Please write a query giving a report of all of the names of the groups in the system that have at least 30 members.
SELECT g.title, COUNT(*) AS member_count
FROM groups g
JOIN group_memberships gm ON (gm.group_id = g.group_id)
GROUP BY g.group_id
HAVING member_count > 30;