Three-Day RDBMS Day 3, Session 1

part of http://philip.greenspun.com/teaching/three-day-rdbms/

Lecture: Materialized Views and Event Scheduler

When queries are running too slowly, sometimes programmers create materialized views to hold intermediate computations. These end up being especially valuable in data warehouses, where the database cannot fit into RAM.

In Oracle:

CREATE MATERIALIZED VIEW sales_by_month_by_state
    REFRESH START WITH SYSDATE NEXT
TRUNC(SYSDATE) + 1 + 3/24
   
ENABLE QUERY REWRITE
    AS SELECT t.calendar_month_desc, c.cust_state_province,
       SUM(s.amount_sold) AS sum_sales
       FROM times t, sales s, customers c
       WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id
       GROUP BY t.calendar_month_desc, c.cust_state_province;

Should be rebuilt every morning at 3:00 am (sysdate is fractional after midnight, so the TRUNC function bashes it back to midnight, then we add 3/24 because 1 = one day (Oracle's native date-time handling is not ANSI standard SQL))

The part of the Oracle system that cannot be accomplished in MySQL is "enable query rewrite". The above materialized view will speed up queries against the TIMES, SALES, and CUSTOMERS table even though the queries don't reference the view.

Oracle has options to update the view periodically, e.g,. once every night. This can be accomplished with a MySQL Event, which can be scheduled to run periodically, much like Unix cron or the Microsoft Windows Task Scheduler. See http://dev.mysql.com/doc/refman/5.5/en/create-event.html

Oracle offers the option that the materialized view is updated whenever a change is made to the underlying tables, e.g., a row being inserted. This can be accomplished in MySQL with triggers on the underlying tables.

Exercise: de-duplication

Suppose that we are concerned that there are some duplicate users in our Facebooklet application. Someone registers, for example, as "Thomas Jones" and a Yahoo email address, but then comes back a few months later, forgets that he already has an account, and signs up as "Tom Jones" with a Gmail address. The UNIQUE integrity constraint on email address won't help here. Write a query to find potential dupes. Let's start with a report that shows the following

user_id1, name1, email1, user_id2, name2, email2

(6 columns) where rows have the same last name and the same first character of the first name.

Hints: this will require a self-join of the users table; LEFT(input_string, 1) will return the leftmost character of input_string, using the built-in MySQL string function LEFT.

Exercise: analyzing visit intervals (time series data)

We've supplied you with a table of simulated data on user visits. Here's the data model:

create table ts_users (

        user_id        int not null primary key auto_increment,

        -- cannot be null, no numbers please

      -- in Oracle 11g we'd add CHECK ( regexp_like(name, '^([a-zA-Z ])+$') )

        name                varchar(100) not null,

        -- can be null, but only numbers please (except in MySQL

        -- which doesn't respect CHECK constraints)

        telephone        varchar(20) check (upper(telephone) = lower(telephone))

);

create table ts_visits (

        visit_id                int not null primary key auto_increment,

        user_id        int not null,

        -- store visit time precise to the second

        visit_time        timestamp default current_timestamp,

        foreign key (user_id) references ts_users(user_id) on delete cascade

);

-- note: we use the prefix "ts_" to avoid interference with the

-- real Facebooklet data model

-- the ON DELETE CASCADE causes rows in this table to disappear when the

-- corresponding user is deleted

Note that timestamp(0) = timestamp in MySQL because precision down to the second is the best that it can do. The typical RDBMS can store timestamps with precision down to the microsecond (TIMESTAMP(6)) and Oracle supports nanosecond precision (TIMESTAMP(9)), which could come in handy for physics and electronics data! (Sometimes folks will use those last digits of a TIMESTAMP(6) as a random number.)

We've inserted users with the following command:

insert into ts_users (name) values ('moe'),('larry'),('curly');

-- let's find out what the last USER_ID was

select last_insert_id();

and used the script at http://philip.greenspun.com/teaching/three-day-rdbms/populate-visits.sql to populate the table with more or less random data.

Please write a query that will show, for 'moe', the time intervals in days between his visits (this report should have N-1 rows where N is the number of visits by 'moe') along with the date-times of the two visits that generated that interval, ordered by ascending date.

Hint: a self-join, a subquery, and the use of the MAX function may be required.

Please write a single query that gives a report of the average interval between visits for each user and then an overall average for the site of time between visits for a particular user. For clarity: it should be a single SELECT statement that returns N+1 rows where N is the number of users.

Hint: the ROLLUP feature of GROUP BY may be useful. (*** I'm actually not sure if it will work to give the average per user or if it will mash all of the visit intervals together ***)

Please cut and paste to add a new user and a couple of visits:

insert into ts_users (name) values ('joe newbie');

insert into ts_visits

(user_id, visit_time)

values

(last_insert_id(), '2010-02-01'), (last_insert_id(), '2010-04-01');

Now let's have a report with three columns:

Hint: This can be done with GROUP BY, HAVING, MIN, MAX and UNION or with the CASE statement.

Exercise: A basic iPhone application

With a few extra tags you can transform a Web page into a "native-like" iPhone application.  Start by creating a copy of your server side Android scripts to

http://localhost/php/iphone/status-updates.php?for_user_id=45 (example) and

http://localhost/php/iphone/post-new-status.php

If necessary, adjust the html layout of status-updates.php to 320x460 pixels.  Next, create or locate a 57x57 pixel icon in either png or gif format (a Google image search on "57x57 icon" is fruitful) on your server at

http://localhost/php/iphone/basic.png (or basic.gif as appropriate)

and add the following link element to the <HEAD> section of your page:

<link rel="apple-touch-icon" href="/php/iphone/basic.png"/>

To hide the various Safari controls when you page launches, also add the following to the <HEAD>:

<meta name="apple-mobile-web-app-capable" content="yes" />

<meta name="apple-mobile-web-app-status-bar-style" content="black" />

Test your application by launching it in Chrome.

Bonus: copy your scripts and install your data model on a publically viewable, database-backed web server, and invite your friends to try it on their iPhones.  They will first need to point their browser to your page, and then select "Add to Home Page" from the menu.  The page will then appear like a native iPhone app.

More:

Discussion: Is everything that we learned pointless/worthless?

The last question from a class member in January 2011 was "What about this NoSQL stuff I keep hearing about?" For him and everyone else, we've prepared http://philip.greenspun.com/teaching/three-day-rdbms/beyond-the-rdbms