1 of 25

CSE 414: Section 8

BCNF and Views

May 23rd, 2019

2 of 25

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

The meaning of the functional dependency is that for every value of A, there is a unique value of B. We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

3 of 25

Logistics

Homework:

  • HW6 is due Friday (tomorrow)!
  • Please don’t add Spark package and input data in the git repos, it causes huge download times for us when we grade

Webquizzes:

  • Webquiz 6 is due Saturday night! Relies on content from Friday’s lecture and today’s section!

4 of 25

FD Example

For the relation Student(studentID, name, DateOfBirth, phoneNumber), assuming each student has only one name, then the following functional dependency holds

{studentID} → {name, DateOfBirth}

However, assuming a student may have multiple phone numbers, then the FD

{studentID} → {phoneNumber} does not hold for the table.

5 of 25

Inference Rules and Trivial FDs

Given attributes a, b, c for a relation R, they following inference rules hold for FDs

  • a → a (trivial FD)
  • If a → b & b → c then a → c
  • If a → b & c → d then a,c → b,d

A trivial FD is one where the RHS is a subset of the LHS. That is, every attribute on the RHS is also on the LHS. For example, the two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

Note :- Trivial FDs do not provide real restrictions on the data in the relation, and they are usually ignored.

6 of 25

Outline

BCNF decomposition

  • Check whether chosen FD violates BCNF
  • Use any FD that violates BCNF to decompose.

View construction and query processing� 1) From vertically partitioned tables� 2) From horizontally partitioned tables

7 of 25

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys… however a superkey is not necessarily a minimal key.

8 of 25

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

9 of 25

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

10 of 25

Closure Algorithm

Goal:

We want everything that an attribute/set of attributes determine

Observation:

  • If we have A -> B and B -> C, then A -> C
  • So really, A -> B and C
  • Formal notation is {A}+ = {A, B, C}
  • Since the closure of A is all attributes, A is a key

11 of 25

Conceptual Design

Name

SSN

PhoneNumber

City

Fred

123-45-6789

206-555-1234

Seattle

Fred

123-45-6789

206-555-6543

Seattle

Joe

987-65-4321

908-555-2121

Westfield

SSN Name, City

12 of 25

Conceptual Design

Anomalies:

• Redundancy = repeat data

• Update anomalies = what if Fred moves to “Bellevue”?

• Deletion anomalies = what if Joe deletes his phone number?

13 of 25

Conceptual Design

The BCNF (Boyce-Codd Normal Form) ---- A relation R is in BCNF if every set of attributes is either a superkey or its closure is the same set

14 of 25

BCNF Decomposition Algorithm

Normalize(R)

find X s.t.: X ≠ X+ and X+ ≠ [all attributes]

if (not found) then R is in BCNF

let Z = [all attributes] - X+

decompose R into R1(X+) and R2(X Z)

Normalize(R1); Normalize(R2);

X +

15 of 25

Example

The relation is R (A, B, C, D, E)

FDs : A → E, BC → A, and DE → B

Question : Decompose R into BCNF.

16 of 25

Solution

Notice that {A}+ = {A,E}, which violates the BCNF condition.

We split R to R1(A,E) and R2(A,B,C,D).

R1 satisfies BCNF now, but R2 does not because: {B,C}+ = {B,C,A}.

Notice that there is no E in R2 table so we don't need to consider the FD DE → B!

Split R2 to: R21(B,C,A) and R22(B,C,D)

17 of 25

Lossless Decomposition

Consider the relation R(A,B,C,D,E)

FDs: {AB → C, BC → D, AD → E}

S1 = ΠABC(R), S2 = Π BCD(R), S3 = Π ADE(R)

We need to show that R= S1 ⋈ S2 ⋈ S3

S1(ABC)

S2(BCD)

S3(ADE)

18 of 25

Vertical Partitioning

19 of 25

Vertical Partitioning

20 of 25

Vertical Partitioning

SELECT address

FROM Resumes

WHERE name = ‘Sue’

21 of 25

Vertical Partitioning Applications

  • Advantages
    • Speeds up queries that touch only a small fraction of columns
    • Single column can be compressed effectively, reducing disk I/O
  • Disadvantages
    • Updates are very expensive!
    • Need many joins to access many columns
    • Repeated key columns add overhead

22 of 25

Horizontal Partitioning

23 of 25

Horizontal Partitioning

24 of 25

Horizontal Partitioning

25 of 25

Horizontal Partitioning Applications

  • Performance optimization
    • Especially for data warehousing
    • E.g., one partition per month
    • E.g., archived applications and active applications
  • Distributed and parallel databases