1 of 18

CSE 414: Section 4

Relational Algebra

Datalog

April 25th, 2019

1

2 of 18

Datalog

3 of 18

Datalog Terminology

Head - Body - Atom/Subgoal/Relational predicate

Base Relations (Extensional Database or EDB)

vs Derived Relations (Intentional Database or IDB)

  • Negation + Aggregate

Wildcard

Helper(a,b):-Base1(a,b,_)

NonAns(j):-Base2(j,k),!Base3(k)

Ans(x):-Helper(x,y),!NonAns(y)

3

4 of 18

Souffle Grouping and Aggregation

Vanilla aggregates example:

Get the minimum actor ID� Q(minId) :- minId = min x : { Actor(x, _, _) }��Aggregates over groups example:

For each year, count the number of movies in that year� Q(y,c) :- Movie(_,_,y), c = count : { Movie(_,_,y) }

4

5 of 18

Query Safety

Need a positive relational atom of every variable

What’s wrong with this query?

Find all of Alice’s children without children:

U(x) :- ParentChild(“Alice”,x), !ParentChild(x,y)

5

6 of 18

Datalog Rules

What does it actually mean to have a datalog rule?

Head(...) :- Atom_1(...), Atom_2(...), ..., Atom_n(...)

Implication semantics (body -> head)

6

7 of 18

Query Safety

U(x) :- ParentChild(“Alice”,x), !ParentChild(x,y)

It is domain dependent! Unsafe!

Double negation to the rescue. Why does this work?

NonAns(x) :- ParentChild(“Alice”,x), ParentChild(x,y)

# All of Alice’s children with children

U(x) :- ParentChild(“Alice”,x), !NonAns(x)

# All of Alice’s children without children (safe!)

But we can do better...

7

8 of 18

Query Safety

But we can do better...

hasChild(x) :- ParentChild(x,_)

# People with children

U(x) :- ParentChild(“Alice”,x), !hasChild(x)

# All of Alice’s children without children (safe!)

8

9 of 18

Datalog with Recursion

Able to write complicated queries in a few lines

Graph analysis

Done with query once output does not change.

9

10 of 18

Stratified Datalog

Recursion might not work well with negation

E.g.

A(x):- Table(x), !B(x)

B(x):- Table(x), !A(x)

Solution: Don’t negate or aggregate on an IDB predicate until it is defined

Stratified Datalog Query

10

11 of 18

Stratified Datalog

11

Only IDB predicates defined in strata 1, 2, ..., n may appear under ! or agg in stratum n+1

12 of 18

Relational Algebra (RA)

13 of 18

RA Operators

Standard:

⋃ - Union

⎼ - Diff.

σ - Select

π - Project

⍴ - Rename

Extended:

δ - Duplicate Elim.

ɣ - Group/Agg.

τ - Sorting

Joins:

- Nat. Join

- L.O. Join

- R.O. Join

- F.O. Join

- Cross Product

⋂ - Intersect

R1⋂R2 = R1–(R1–R2)

R1⋂R2 = R1R2

13

14 of 18

A Few More SQL Keywords

(<sub>) INTERSECT (<sub>)

(<sub>) UNION (<sub>)

(<sub>) EXCEPT (<sub>)

14

15 of 18

Ɣ Notation

Grouping and aggregation on group:

ɣattr_1, …, attr_k, count/sum/max/min(attr) -> alias

Aggregation on the entire table:

ɣcount/sum/max/min(attr) -> alias

15

16 of 18

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

16

17 of 18

Query Plans (Example SQL -> RA)

Select-Join-Project structure

Make this SQL query into RA (remember FWGHOS):

SELECT R.b, T.c, max(T.a) AS T_max

FROM Table_R AS R, Table_T AS T

WHERE R.b = T.b

GROUP BY R.b, T.c

HAVING max(T.a) > 99

πR.b, T.c, T_maxT_max>99R.b, T.c, max(T.a)->T_max(R R.b=T.b T)))

17

18 of 18

DataGrip Query Plan Demo

18