Three-Day RDBMS Day 1, Session 1 Solutions

part of http://philip.greenspun.com/teaching/three-day-rdbms/

Exercise: 1-table data model

Create a single table, named "facebook_simple" that can store the above data (suggest four columns). Load in the first row (George W. Bush), either using http://localhost/phpmyadmin/ or an INSERT statement.

create table facebook_simple (

       name              varchar(100),

       group_name        varchar(100),

       likes             text,

       friends           text

);

INSERT INTO facebook_simple (

  name,group_name,likes,friends

) VALUES (

  'George W. Bush','Yale','starting wars, Gitmo','Pamela Anderson, Borat'

);

INSERT INTO facebook_simple (

  name,group_name,likes,friends

) VALUES (

'Barack Hussein Obama II','Harvard','health insurance, unemployment','Bruno, Paula Abdul'

);

INSERT INTO facebook_simple (

  name,group_name,likes,friends

) VALUES (

  'Pamela Anderson','Hollywood','plastic surgery, home video','George W. Bush'

);

INSERT INTO facebook_simple (

  name,group_name,likes,friends

) VALUES (

  'Borat','Kazakstan','farming, travel, singing','Pamela Anderson, Obama'

);

Exercise: King Bush II's friends

Write a query to determine if "Bruno" is among George W. Bush's friends (should return one row with "1" or "t")

SELECT COUNT(*)

FROM facebook_simple

WHERE name = 'George W. Bush'

     AND friends LIKE '%Bruno%';

Write a query to determine if "Borat" is among George W. Bush's friends

-- Note use of INSTR instead of LIKE

SELECT COUNT(*)

FROM facebook_simple

WHERE name = 'George W. Bush'

     AND INSTR(friends,'Borat') > 0;

Let's try this with CASE instead

SELECT name, CASE WHEN INSTR(friends,'Borat') > 0 THEN 1 ELSE 0 END as boratness

FROM facebook_simple;

Write a query to print out a list, one row per friend, of George W. Bush's friends.

-- we won't use "numbers" because it might be a reserved word in some DBMSes

-- e.g., "number" is a data type

CREATE TABLE some_numbers (

   one_number int

);

INSERT INTO some_numbers VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);

-- the function SUBSTRING_INDEX returns the part of the string that precedes

-- n occurrences of a delimiter.

-- See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_substring-index

SELECT

  TRIM(LEADING ' ' FROM SUBSTR(SUBSTRING_INDEX(friends,',',one_number),CASE WHEN one_number = 1 THEN 1 ELSE LENGTH(SUBSTRING_INDEX(friends,',',one_number-1))+2 END)) as friend

FROM facebook_simple s

  JOIN some_numbers n

WHERE name = 'George W. Bush';

Write a query to retrieve the table, sorted by last name.

SELECT name, group_name, likes, friends

FROM facebook_simple

ORDER BY SUBSTRING(name, LENGTH(SUBSTRING(REVERSE(name), LOCATE(" ", REVERSE(name))-1)));

Exercise: multi-table data model

Now that you've developed a working data model... throw it out and use ours. That way everyone in the class will be developing queries against the same tables and solutions can be compared easily.

Exercise: JOINs

Let's have queries for the following reports:

SELECT u.user_id, concat(u.first_names, ' ', u.last_name) as name, g.title

FROM users u

  JOIN group_memberships m ON (m.user_id = u.user_id)

  JOIN groups g ON (g.group_id = m.group_id)

ORDER BY UPPER(u.last_name), UPPER(u.first_names)

LIMIT 20;

the last clause tells MySQL to return only the first 20 rows (for easy display)

What if we get sick of that CONCAT to get the full name? We could define a view

CREATE VIEW users_one_name

AS

SELECT user_id, concat(first_names, ' ', last_name) as name, email, password_hash, creation_date

FROM users;

and then query from that view.

-- Let's start by generating a list of all of the unique friendship relationships.

-- We do this by using friendships as the driving table:

SELECT CONCAT(u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name

FROM friendships f

  JOIN users u1 ON (u1.user_id = f.to_user_id)

  JOIN users u2 ON (u2.user_id = f.from_user_id)

WHERE f.accepted IS NOT NULL

ORDER BY user_name

LIMIT 20;

There are two problems with this report.  First, if we have two users with the same name, say "John Smith", we will not be able to tell them apart in the report.  Solve this by adding the unique user id into the user_name column:

 

SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name

FROM friendships f

  JOIN users u1 ON (u1.user_id = f.to_user_id)

  JOIN users u2 ON (u2.user_id = f.from_user_id)

WHERE f.accepted IS NOT NULL

ORDER BY user_name

LIMIT 20;

Second, only the to->from relationships are grouped together by this query.  We can see all of the friends invited by Barbara Harris grouped together, while the friends who invited her are scattered throughout the result set.  To see all of a user's friendships group together regardless of direction, run the same query in the reverse direction and UNION

(SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name

FROM friendships f

  JOIN users u1 ON (u1.user_id = f.to_user_id)

  JOIN users u2 ON (u2.user_id = f.from_user_id)

WHERE f.accepted IS NOT NULL)

UNION

(SELECT CONCAT(u1.user_id, ' ', u1.first_names, ' ', u1.last_name) AS user_name, CONCAT(u2.first_names, ' ', u2.last_name) AS friend_name

FROM friendships f

  JOIN users u1 ON (u1.user_id = f.from_user_id)

  JOIN users u2 ON (u2.user_id = f.to_user_id)

WHERE f.accepted IS NOT NULL)

ORDER BY user_name

LIMIT 20;

Alternatively, simply add an OR clause to the WHERE. Here's Kathleen Alexander's solution:

SELECT u1.first_names, u1.last_name, u2.first_names, u2.last_name

FROM users u1, users u2, friendships  fs

WHERE (u1.user_id=fs.from_user_id AND u2.user_id=fs.to_user_id)

OR (u1.user_id=fs.to_user_id AND u2.user_id=fs.from_user_id)

ORDER BY u1.last_name, u1.first_names

SELECT CONCAT(u.first_names, ' ', u.last_name) AS name

FROM user_like_map m

  JOIN likes l ON (m.like_id = l.like_id)

  JOIN users u ON (u.user_id = m.user_id) WHERE l.topic = 'Citizen Kane'

ORDER BY UPPER(name)

LIMIT 20;

Exercise: OUTER JOINs

How about a report showing all of the groups in the system and their associated members? It seems simple, but what about groups that don't have any members yet? We want them in our report.

-- Not all groups in the test data have members **

-- So let's insert a group with none

INSERT INTO groups (title) VALUES ('Hogwarts');

SELECT g.title, CONCAT(u.first_names, ' ', u.last_name) AS name

FROM groups g

  LEFT OUTER JOIN group_memberships gm ON (gm.group_id = g.group_id)

  LEFT OUTER JOIN users u ON (u.user_id = gm.user_id)

LIMIT 20;

-- The second LEFT OUTER JOIN is required because groups with no

-- members will have a NULL value for gm.user_id.

-- Remove the LIMIT to check if the group having no users comes through in the result, just be ready for a lot of output!  (Did you do the INSERT first?)

Exercise: GROUP BY

Please write a query giving a report of all of the names of the groups in the system that have at least one member next to the number of members in that group. You'll be using GROUP BY and probably count(*).

SELECT g.title, COUNT(*) AS member_count

FROM groups g

  JOIN group_memberships gm ON (gm.group_id = g.group_id)

GROUP BY g.group_id;

Write a second query that also includes the names of groups with no members.

SELECT g.title, COUNT(gm.group_id) as member_count

FROM groups g

  LEFT OUTER JOIN group_memberships gm ON (gm.group_id = g.group_id)

GROUP BY g.group_id;

-- Bonus question: Why COUNT(gm.group_id) and not COUNT(*)?

Exercise: find the mode

Use your new GROUP BY skills to write a query that will return a single row containing the most-liked "like" (human-readable topic, not the like_id)  and the number of users who liked it.

SELECT l.topic, COUNT(*) AS num_likers

FROM likes l

  JOIN user_like_map m ON (m.like_id = l.like_id)

GROUP BY l.like_id

ORDER BY COUNT(*) DESC

LIMIT 1;          

Extra credit: What if there are two or three "likes", each with the same number of users mapped to it? Can you generate a report showing all of the likes whose popularity is at the mode? You might need a subquery and/or a view.

The LIMIT can be increased in the solution above, but leaves no way to be sure that we are extracting all of the most-liked "likes", nor a way to restrict the report to just those items.  SQL does not have a means of expressing "fetch rows until the value in column 2 changes", but we can take the following approach

  1. Determine the maximum number of users liking a like.
  2. Find all likes with that number.

Maximum number of users liking a like:

SELECT COUNT(*) AS num_likers

FROM likes l

  JOIN user_like_map m ON (m.like_id = l.like_id)

GROUP BY l.like_id

ORDER BY num_likers DESC

LIMIT 1;      

Find all likes with that number:

SELECT l.topic, COUNT(*) AS num_likers

FROM likes l

  JOIN user_like_map m ON (m.like_id = l.like_id)

GROUP BY l.like_id

HAVING num_likers =

   (

    SELECT COUNT(*) AS num_likers

    FROM likes l

      JOIN user_like_map m ON (m.like_id = l.like_id)

    GROUP BY l.like_id

    ORDER BY num_likers DESC

    LIMIT 1

   )

ORDER BY l.topic;

One of the class members used a view to simplify the query:

create view like_report as

select l.topic, count(lm.user_id) as Members

from user_like_map as lm

join likes as l on lm.like_id = l.like_id

group by lm.like_id

order by Members desc;

select topic, Members

from like_report

where Members = (select max(Members) from like_report);

Exercise: HAVING

Please write a query giving a report of all of the names of the groups in the system that have at least 30 members.

SELECT g.title, COUNT(*) AS member_count

FROM groups g

  JOIN group_memberships gm ON (gm.group_id = g.group_id)

GROUP BY g.group_id

HAVING member_count > 30;

Exercise: bulkloading some data

http://dev.mysql.com/doc/refman/5.5/en/load-data.html

http://dev.mysql.com/doc/refman/5.5/en/load-xml.html

CREATE TABLE load_infile_test (col1 int, col2 int, col3 boolean, col4 boolean, col5 varchar(32));

LOAD DATA INFILE '/usr/local/mysql-5.5.8-linux2.6-i686/mysql-test/std_data/intersect-bug50389.tsv' INTO TABLE load_infile_test FIELDS TERMINATED BY '\t';

Verify: SELECT COUNT(*) FROM load_infile_test;