1 of 45

Extensions to SQL

©Silberschatz, Korth and Sudarshan

5.1

Database System Concepts - 6th Edition

2 of 45

Functions and Procedures

  • SQL:1999 supports functions and procedures
    • Functions/procedures can be written in SQL itself, or in an external programming language (e.g., C, Java).
    • Functions written in an external languages are particularly useful with specialized data types such as images and geometric objects.
      • Example: functions to check if polygons overlap, or to compare images for similarity.
    • Some database systems support table-valued functions, which can return a relation as a result.
  • SQL:1999 also supports a rich set of imperative constructs, including
    • Loops, if-then-else, assignment
  • Many databases have proprietary procedural extensions to SQL that differ from SQL:1999.

©Silberschatz, Korth and Sudarshan

5.2

Database System Concepts - 6th Edition

3 of 45

SQL Functions

  • Define a function that, given the diseases, returns the count of the number of patients diagnosed with it.

delimiter //

create function disease_count (disease_name varchar(45)) returns int deterministic

begin

declare d_count int default 0;

select count(1) into d_count

from Diagnosis

where disease = disease_name;

return d_count;

end//

delimiter ;

  • The function disease_count can be used to find the patient names and diagnosis dates of all diseases with more that 5 patients.

select pname, diag_datefrom Diagnosis as Dwhere disease_count (D.disease ) > 5;

©Silberschatz, Korth and Sudarshan

5.3

Database System Concepts - 6th Edition

4 of 45

SQL functions (Cont.)

  • Compound statement: begin … end
    • May contain multiple SQL statements between begin and end.
  • returns -- indicates the variable-type that is returned (e.g., integer)
  • return -- specifies the values that are to be returned as result of invoking the function

©Silberschatz, Korth and Sudarshan

5.4

Database System Concepts - 6th Edition

5 of 45

Table Functions

  • SQL:2003 added functions that return a relation as a result
  • Example: Return all patients with a certain disease
  • create function diagnosed_with(disease varchar(45))

returns table (

pname varchar(45),

disease varchar(45),

diagnosis_date date

)

return table� (select pname, disease, diagnosisfrom Diagnosis where Diagnosis.disease= diagnosed_with.disease)

  • Usage

select *� from table (diagnosed_with(‘Disease’))

©Silberschatz, Korth and Sudarshan

5.5

Database System Concepts - 6th Edition

6 of 45

SQL Procedures

  • The disease_count function could instead be written as procedure:

delimiter //

create procedure disease_count_proc (in disease_name varchar(45), out d_count int)

begin

select count(1) into d_count

from Diagnosis

where Diagnosis.disease = disease_name;

end//

delimiter ;

  • Procedures can be invoked either from an SQL procedure or from embedded SQL, using the call statement.

set @d_count := 0 ;

call disease_count_proc('COVID', @d_count);

  • SQL:1999 allows more than one function/procedure of the same name (called name overloading), as long as the number of �arguments differ, or at least the types of the arguments differ

©Silberschatz, Korth and Sudarshan

5.6

Database System Concepts - 6th Edition

7 of 45

Language Constructs for Procedures & Functions

  • SQL supports constructs that gives it almost all the power of a general-purpose programming language.
    • Warning: most database systems implement their own variant of the standard syntax below.
  • Compound statement: begin … end,
    • May contain multiple SQL statements between begin and end.
    • Local variables can be declared within a compound statements
  • While and repeat statements:
    • while boolean expression do

sequence of statements ;

end while

    • repeat

sequence of statements ;

until boolean expression

end repeat

©Silberschatz, Korth and Sudarshan

5.7

Database System Concepts - 6th Edition

8 of 45

Language Constructs (Cont.)

  • For loop
    • Permits iteration over all results of a query
  • Example: Find the sum of ages of all patients�� declare n integer default 0;� for r as� select age from Patientsdo� set n = n + r.ageend for

©Silberschatz, Korth and Sudarshan

5.8

Database System Concepts - 6th Edition

9 of 45

Language Constructs – if-then-else

  • Conditional statements (if-then-else)

if boolean expression � then statement or compound statement � elseif boolean expression then statement or compound statement else statement or compound statement end if

©Silberschatz, Korth and Sudarshan

5.9

Database System Concepts - 6th Edition

10 of 45

External Language Routines

  • SQL:1999 allows the definition of procedures in an imperative programming language, (Java, C#, C or C++) which can be invoked from SQL queries.
  • Functions defined in this fashion can be more efficient than functions defined in SQL, and computations that cannot be carried out in SQL can be executed by these functions.
  • Declaring external language procedures and functions�

create procedure disease_count_proc(in disease_name varchar(20),� out count integer)�language C�external name ’ /usr/avi/bin/disease_count_proc’��create function disease_count(disease_name varchar(20))�returns integer�language C�external name ‘/usr/avi/bin/disease_count’

©Silberschatz, Korth and Sudarshan

5.10

Database System Concepts - 6th Edition

11 of 45

External Language Routines (Cont.)

  • Benefits of external language functions/procedures:
    • more efficient for many operations, and more expressive power.
  • Drawbacks
    • Code to implement function may need to be loaded into database system and executed in the database system’s address space.
      • risk of accidental corruption of database structures
      • security risk, allowing users access to unauthorized data
    • There are alternatives, which give good security at the cost of potentially worse performance.
    • Direct execution in the database system’s space is used when efficiency is more important than security.

©Silberschatz, Korth and Sudarshan

5.11

Database System Concepts - 6th Edition

12 of 45

Security with External Language Routines

  • To deal with security problems, we can do one of the following:
    • Use sandbox techniques
      • That is, use a safe language like Java, which cannot be used to access/damage other parts of the database code.
    • Run external language functions/procedures in a separate process, with no access to the database process’ memory.
      • Parameters and results communicated via inter-process communication
  • Both have performance overheads
  • Many database systems support both of above approaches as well as direct executing in database system address space.

©Silberschatz, Korth and Sudarshan

5.12

Database System Concepts - 6th Edition

13 of 45

Recursive Queries

©Silberschatz, Korth and Sudarshan

5.13

Database System Concepts - 6th Edition

14 of 45

Recursion in SQL

  • SQL:1999 permits recursive view definition
  • Example: find which courses are a prerequisite, whether directly or indirectly, for a specific course �with recursive rec_prereq(course_id, prereq_id) as (� select course_id, prereq_id� from prereq� union� select rec_prereq.course_id, prereq.prereq_id, from rec_prereq, prereq� where rec_prereq.prereq_id = prereq.course_id� )�select ∗�from rec_prereq;

This example view, rec_prereq, is called the transitive closure of the prereq relation

©Silberschatz, Korth and Sudarshan

5.14

Database System Concepts - 6th Edition

15 of 45

Example – fixed point computation

rec_prereq table is initially empty. So rec_prereq table becomes the same as

course_id

prereq_id

102

101

102

100

103

102

Prereq table

Iteration 1: req_prereq table

course_id

prereq_id

102

101

102

100

103

102

Iteration 2: req_prereq table

course_id

prereq_id

102

101

102

100

103

102

103

101

103

100

Two new tuples added due to join.

Iteration 3: req_prereq table

course_id

prereq_id

102

101

102

100

103

102

103

101

103

100

No new tuples added. Reached fixed point

©Silberschatz, Korth and Sudarshan

5.15

Database System Concepts - 6th Edition

16 of 45

The Power of Recursion

  • Recursive views make it possible to write queries, such as transitive closure queries, that cannot be written without recursion or iteration.
    • Intuition: Without recursion, a non-recursive non-iterative program can perform only a fixed number of joins of prereq with itself
      • This can give only a fixed number of levels of managers
      • Given a fixed non-recursive query, we can construct a database with a greater number of levels of prerequisites on which the query will not work
      • Alternative: write a procedure to iterate as many times as required
        • See procedure findAllPrereqs in book

©Silberschatz, Korth and Sudarshan

5.16

Database System Concepts - 6th Edition

17 of 45

The Power of Recursion

  • Computing transitive closure using iteration, adding successive tuples to rec_prereq
    • Each step of the iterative process constructs an extended version of rec_prereq from its recursive definition.
    • The final result is called the fixed point of the recursive view definition.
  • Recursive views are required to be monotonic. That is, if we add tuples to prereq the view rec_prereq contains all of the tuples it contained before, plus possibly more

©Silberschatz, Korth and Sudarshan

5.17

Database System Concepts - 6th Edition

18 of 45

Advanced Aggregation Features

©Silberschatz, Korth and Sudarshan

5.18

Database System Concepts - 6th Edition

19 of 45

Ranking

  • Suppose we wish to write the rank tuples based on a sort order.
  • E.g, rank patients based on the lexicographic ordering of their names.
    • Say patients are {Mary, John, Alex} then sort order is Alex, John, Mary> and the ranked output would be <Alex,1>, <John, 2>, <Mary, 3>

select name, (1 + (select count(*)

from Patients as P2

where P2.name < P1.name)) as p_rank

from Patients as P1

order by p_rank;

©Silberschatz, Korth and Sudarshan

5.19

Database System Concepts - 6th Edition

20 of 45

Ranking Operator in SQL

  • Instead, we would use the ranking operator

select name, rank() over (order by name) as p_rank

from Patients

order by p_rank

  • Ranking may leave gaps: e.g. if 2 students have the same top GPA, both have rank 1, and the next rank is 3
    • dense_rank does not leave gaps, so next dense rank would be 2

©Silberschatz, Korth and Sudarshan

5.20

Database System Concepts - 6th Edition

21 of 45

Ranking (Cont.)

  • Ranking can be done within partition of the data.

select name, rank() over (partition by age order by name ) as p_rank

from Patients

order by p_rank

  • Multiple rank clauses can occur in a single select clause.
  • Ranking is done after applying group by clause/aggregation
  • Can be used to find top-n results
    • More general than the limit n clause supported by many databases, since it allows top-n within each partition

©Silberschatz, Korth and Sudarshan

5.21

Database System Concepts - 6th Edition

22 of 45

name

age

Alice

30

Bob

40

Carol

30

David

40

Evan

30

Frank

50

select name, rank() over (partition by age order by name ) as p_rank

from Patients

order by p_rank

©Silberschatz, Korth and Sudarshan

5.22

Database System Concepts - 6th Edition

23 of 45

Windowing

  • Used to smooth out random variations.
  • E.g., moving average: “Given sales values for each date, calculate for each date the average of the sales on that day, the previous day, and the next day”
  • Window specification in SQL:
    • Given relation sales(date, value)

select date, sum(value) over � (order by date between rows 1 preceding and 1 following)� from sales

©Silberschatz, Korth and Sudarshan

5.23

Database System Concepts - 6th Edition

24 of 45

Windowing

  • Examples of other window specifications:
    • between rows unbounded preceding and current
    • rows unbounded preceding
    • range between 10 preceding and current row
      • All rows with values between current row value –10 to current value
    • range interval 10 day preceding
      • Not including current row

©Silberschatz, Korth and Sudarshan

5.24

Database System Concepts - 6th Edition

25 of 45

Windowing (Cont.)

  • Can do windowing within partitions
  • E.g., Given a relation transaction (account_number, date_time, value), where value is positive for a deposit and negative for a withdrawal
    • “Find total balance of each account after each transaction on the account”

select account_number, date_time,� sum (value) over� (partition by account_number order by date_timerows unbounded preceding)� as balancefrom transactionorder by account_number, date_time

'

©Silberschatz, Korth and Sudarshan

5.25

Database System Concepts - 6th Edition

26 of 45

OLAP

©Silberschatz, Korth and Sudarshan

5.26

Database System Concepts - 6th Edition

27 of 45

Advanced Aggregations for Data Analysis

©Silberschatz, Korth and Sudarshan

5.27

Database System Concepts - 6th Edition

28 of 45

Multidimensional Data & Schema

  • Database systems support several aggregation operators for OLAP in data warehouses.

  • Data Warehouse -- large integrated collection of real time transaction data from different databases into a single database for analysis.

  • Often data stored for analysis in warehouses consists of two types of tables:
    • Fact tables, which are large
      • E.g, sales(item_id, store_id, customer_id, date, number, price)
    • Dimension tables, which are relatively small
      • Store extra information about stores, items, etc.

©Silberschatz, Korth and Sudarshan

5.28

Database System Concepts - 6th Edition

29 of 45

Multidimensional Data and�Schemas

  • Attributes of fact tables can be usually viewed as
    • Measure attributes
      • measure some value, and can be aggregated upon
      • e.g., the attributes number or price of the sales relation
    • Dimension attributes
      • dimensions on which measure attributes are viewed
      • e.g., attributes item_id, color, and size of the sales relation
      • Usually small ids that are foreign keys to dimension tables

©Silberschatz, Korth and Sudarshan

5.29

Database System Concepts - 6th Edition

30 of 45

Data Warehouse Schema

©Silberschatz, Korth and Sudarshan

5.30

Database System Concepts - 6th Edition

31 of 45

Multidimensional Data and�Warehouse Schemas

  • Resultant schema is called a star schema
    • More complicated schema structures
      • Snowflake schema: multiple levels of dimension tables
      • May have multiple fact tables
  • Typically
    • fact table joined with dimension tables and then
    • group-by on dimension table attributes, and then
    • aggregation on measure attributes of fact table
  • Some applications do not find it worthwhile to bring data to a common schema
    • Data lakes are repositories which allow data to be stored in multiple formats, without schema integration
    • Less upfront effort, but more effort during querying

©Silberschatz, Korth and Sudarshan

5.31

Database System Concepts - 6th Edition

32 of 45

Database Support for Data Warehouses

  • Data in warehouses usually append only, not updated
    • Can avoid concurrency control overheads
  • Data warehouses often use column-oriented storage
    • E.g., a sequence of sales tuples is stored as follows
      • Values of item_id attribute are stored as an array
      • Values of store_id attribute are stored as an array,
      • And so on
    • Arrays are compressed, reducing storage, IO and memory costs significantly
    • Queries can fetch only attributes that they care about, reducing IO and memory cost
    • See Section 13.6 for more detauls
  • Data warehouses often use parallel storage and query processing infrastructure
    • Distributed file systems, Map-Reduce, Hive, spark …

©Silberschatz, Korth and Sudarshan

5.32

Database System Concepts - 6th Edition

33 of 45

OLAP Support In SQL

  • Online Analytical Processing (OLAP)
    • Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay)

  • SQL provides variety of advanced operators to support OLAP

©Silberschatz, Korth and Sudarshan

5.33

Database System Concepts - 6th Edition

34 of 45

Example sales relation

...

...

...

...

...

...

...

...

    • sales (item_name, color, clothes_size, quantity)

This is a simplified version of the sales fact table joined with the dimension tables, and many attributes removed (and some renamed)�

©Silberschatz, Korth and Sudarshan

5.34

Database System Concepts - 6th Edition

35 of 45

Cross Tabulation of sales by item_name and color

  • The table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table.
    • Values for one of the dimension attributes form the row headers
    • Values for another dimension attribute form the column headers
    • Other dimension attributes are listed on top
    • Values in individual cells are (aggregates of) the values of the �dimension attributes that specify the cell.

©Silberschatz, Korth and Sudarshan

5.35

Database System Concepts - 6th Edition

36 of 45

Relational Representation of Cross-tabs

  • Cross-tabs can be represented as relations
    • We use the value all is used to represent aggregates.
    • The SQL standard actually uses null values in place of all despite confusion with regular null values.

©Silberschatz, Korth and Sudarshan

5.36

Database System Concepts - 6th Edition

37 of 45

Data Cube

  • A data cube is a multidimensional generalization of a cross-tab
  • Can have n dimensions; we show 3 below
  • Cross-tabs can be used as views on a data cube

©Silberschatz, Korth and Sudarshan

5.37

Database System Concepts - 6th Edition

38 of 45

Cube Operator

  • The cube operation computes union of group by’s on every subset of the specified attributes
  • Example relation. sales(item_name, color, clothes_size, number)
  • E.g. consider the query

select item_name, color, size, sum(number)� from sales� group by cube(item_name, color, size)

This computes the union of eight different groupings of the sales relation:

{ (item_name, color, size), (item_name, color), � (item_name, size), (color, size), � (item_name), (color), � (size), ( ) }

where ( ) denotes an empty group by list.

  • For each grouping, the result contains the null value �for attributes not present in the grouping.

CUBE operator is not directly

Supported by Mysql

©Silberschatz, Korth and Sudarshan

5.38

Database System Concepts - 6th Edition

39 of 45

Roll Up Operator

  • The rollup construct generates union on every prefix of specified list of attributes
  • E.g.,

select item_name, color, size, sum(number)� from sales� group by rollup(item_name, color, size)

Generates union of four groupings:

{ (item_name, color, size), (item_name, color), (item_name), ( ) }

.

©Silberschatz, Korth and Sudarshan

5.39

Database System Concepts - 6th Edition

40 of 45

Example in Mysql

'G','1','Al'

'G','1','Alex'

'G','0','Barbara'

'F','0','Jane'

'E','0','Mehdi'

'E','1','Mehdi'

'D','1','Mary'

'C','1','Bob'

'C','0','Jane'

'B','1','Jane'

'B','0','Mary'

'A','1','Alex'

'A','1','Mary'

'A','1','Mary'

select pname, test, result from Outcomes;

select test, result, count(*) from Outcomes

group by test, result

order by test desc;

'G','0','1'

'G','1','2'

'F','0','1'

'E','0','1'

'E','1','1'

'D','1','1'

'C','0','1'

'C','1','1'

'B','0','1'

'B','1','1'

'A','1','3'

select test, result, count(*) from Outcomes

group by test, result with Rollup

order by test desc;

'G','0','1'

'G','1','2'

'G',NULL,'3'

'F','0','1'

'F',NULL,'1'

'E','0','1'

'E','1','1'

'E',NULL,'2'

'D','1','1'

'D',NULL,'1'

'C','0','1'

'C','1','1'

'C',NULL,'2'

'B','0','1'

'B','1','1'

'B',NULL,'2'

'A','1','3'

'A',NULL,'3'

NULL,NULL,'14'

select test, result, count(*) from Outcomes

group by result, test with Rollup

order by result desc;

'A','1','3'

'B','1','1'

'C','1','1'

'D','1','1'

'E','1','1'

'G','1','2'

NULL,'1','9'

'B','0','1'

'C','0','1'

'E','0','1'

'F','0','1'

'G','0','1'

NULL,'0','5'

NULL,NULL,'14'

©Silberschatz, Korth and Sudarshan

5.40

Database System Concepts - 6th Edition

41 of 45

Hierarchies on Dimensions

  • Hierarchy on dimension attributes: lets dimensions to be viewed at different levels of detail
    • E.g., the dimension DateTime can be used to aggregate by hour of day, date, day of week, month, quarter or year

©Silberschatz, Korth and Sudarshan

5.41

Database System Concepts - 6th Edition

42 of 45

Cross Tabulation With Hierarchy

  • Cross-tabs can be easily extended to deal with hierarchies
    • Can drill down or roll up on a hierarchy

©Silberschatz, Korth and Sudarshan

5.42

Database System Concepts - 6th Edition

43 of 45

Additional Operators

  • Pivoting: convert row values to columns
    • take values of a given column, and convert them into headers
    • E.g., turning colors to separate columns
  • Slicing: creating a cube after fixing one dimension value (slice of cube)
    • E.g., fixing color to white.
    • called dicing, when values for multiple dimensions are fixed. (e.g., color = white, and size = small)
  • Rollup: moving from finer-granularity data to a coarser granularity
    • E.g., aggregating away an attribute, or
    • E.g., moving from aggregates by day to aggregates by month or year
  • Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data

©Silberschatz, Korth and Sudarshan

5.43

Database System Concepts - 6th Edition

44 of 45

Extended Aggregation (Cont.)

  • Multiple rollups and cubes can be used in a single group by clause
    • Each generates set of group by lists, cross product of sets gives overall set of group by lists
  • E.g.,

select item_name, color, size, sum(number)� from salesgroup by rollup(item_name), rollup(color, size)

generates the groupings

{item_name, ()} X {(color, size), (color), ()}

= { (item_name, color, size), (item_name, color), (item_name), � (color, size), (color), ( ) }

©Silberschatz, Korth and Sudarshan

5.44

Database System Concepts - 6th Edition

45 of 45

OLAP Implementation (Cont.)

  • Early OLAP systems precomputed all possible aggregates in order to provide online response
    • Space and time requirements for doing so can be very high
      • 2n combinations of group by
    • It suffices to precompute some aggregates, and compute others on demand from one of the precomputed aggregates
      • Can compute aggregate on (item_name, color) from an aggregate on (item_name, color, size)
        • For all but a few “non-decomposable” aggregates such as median
        • is cheaper than computing it from scratch
  • Several optimizations available for computing multiple aggregates
    • Can compute aggregate on (item_name, color) from an aggregate on (item_name, color, size)
    • Can compute aggregates on (item_name, color, size), �(item_name, color) and (item_name) using a single sorting �of the base data

©Silberschatz, Korth and Sudarshan

5.45

Database System Concepts - 6th Edition