Three-Day RDBMS Day 3, Session 1 Solutions

part of http://philip.greenspun.com/teaching/three-day-rdbms/

Exercise: analyzing visit intervals (time series data)

Please write a query that will show, for 'moe', the time intervals in days between his visits (this report should have N-1 rows where N is the number of visits by 'moe') along with the date-times of the two visits that generated that interval, ordered by ascending date.

select v1.visit_time as start_time, v2.visit_time as end_time,

(unix_timestamp(v2.visit_time)-

unix_timestamp(v1.visit_time))/(60*60*24) as duration

from ts_visits v1, ts_visits v2, ts_users

where v1.visit_time =        (select max(v3.visit_time)

from ts_visits v3

where v3.user_id = v1.user_id

and v3.visit_time < v2.visit_time)

and v1.user_id = v2.user_id

and v1.user_id = ts_users.user_id

and name='moe';

You can also achieve the same result using aggregate functions instead of a subquery:

select v1.visit_time as start_time,

       min(v2.visit_time) as end_time,

       (unix_timestamp(min(v2.visit_time)) -

        unix_timestamp(v1.visit_time))/(60*60*24) as duration

from ts_visits v1

join ts_visits v2 on (v2.user_id = v1.user_id

                      and v2.visit_time > v1.visit_time)

join ts_users u on (u.user_id = v1.user_id) where u.name = 'moe'

group by v1.visit_id, v1.visit_time

order by v1.visit_time;

Performance note: the mysql client program gives you a quick summary of how long each query took, e.g.

39 rows in set (0.27 sec)

Which approach is faster, the subquery or group by? Why?

Please write a query that gives a report of the average interval between visits for each user and then an overall average for the site of time between visits for a particular user.

select user_id,

avg(unix_timestamp(visit_time)-

unix_timestamp((select max(visit_time) from

ts_visits where visit_time<tsv.visit_time and

user_id=tsv.user_id)))/(60*60*24) as visit_interval

from ts_visits tsv group by user_id with rollup;

Now let's have a report with three columns:

Hint: This can be done with GROUP BY, HAVING, MIN, MAX and UNION or with the CASE statement.

Philip's solution:

select name, min(visit_time), max(visit_time)

from ts_users, ts_visits

where ts_users.user_id = ts_visits.user_id

group by ts_users.user_id

having count(*) = 2

UNION

select name, min(visit_time), concat(count(visit_time)-1, " more")

from ts_users, ts_visits

where ts_users.user_id = ts_visits.user_id

group by ts_users.user_id

having count(*) > 2

ORDER BY name;

Note that the ORDER BY applies to the entire set.

Alternative: let's try this with CASE

select

  name,

  min(visit_time) as first,

  CASE count(*)

WHEN 1 THEN NULL

WHEN 2 THEN max(visit_time)

ELSE count(*)-1 END as last

from ts_users, ts_visits

where ts_users.user_id = ts_visits.user_id

group by ts_users.user_id

ORDER BY name;