1 of 25

CSE 414: Section 8

BCNF and Views

May 23rd, 2019

2 of 25

Functional Dependencies

A functional dependency (FD) for relation R is a formula of the form A → B

where A and B are sets or individual attributes of R.

The meaning of the functional dependency is that for every value of A, there is a unique value of B. We say the FD holds for R if, for any instance of R, whenever two tuples agree in value for all attributes of A, they also agree in value for all attributes of B.

3 of 25

Logistics

Homework:

HW6 is due Friday (tomorrow)!
Please don’t add Spark package and input data in the git repos, it causes huge download times for us when we grade

Webquizzes:

Webquiz 6 is due Saturday night! Relies on content from Friday’s lecture and today’s section!

4 of 25

FD Example

For the relation Student(studentID, name, DateOfBirth, phoneNumber), assuming each student has only one name, then the following functional dependency holds

{studentID} → {name, DateOfBirth}

However, assuming a student may have multiple phone numbers, then the FD

{studentID} → {phoneNumber} does not hold for the table.

5 of 25

Inference Rules and Trivial FDs

Given attributes a, b, c for a relation R, they following inference rules hold for FDs

a → a (trivial FD)
If a → b & b → c then a → c
If a → b & c → d then a,c → b,d

A trivial FD is one where the RHS is a subset of the LHS. That is, every attribute on the RHS is also on the LHS. For example, the two FDs below are trivial:

{name} → {name}

{studentID,name} → {name}

Note :- Trivial FDs do not provide real restrictions on the data in the relation, and they are usually ignored.

6 of 25

Outline

BCNF decomposition

Check whether chosen FD violates BCNF
Use any FD that violates BCNF to decompose.

View construction and query processing� 1) From vertically partitioned tables� 2) From horizontally partitioned tables

7 of 25

Keys

We call an attribute (or a set of attributes) that determines all other attributes in a schema to be a superkey.

If it is the smallest set of attributes (in terms of cardinality) that does this we call that set a minimal key or just key

By definition, all minimal keys are superkeys… however a superkey is not necessarily a minimal key.

8 of 25

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

C -> D

Question: is {A, C} a superkey, minimal key, or both?

Question: is {A, B, C, D} a superkey, minimal key, or both?

Question: is {A, C, D} a superkey, minimal key, or both?

9 of 25

Superkeys and Minimal Keys

R(A, B, C, D)

A -> B

B -> C, D

Question: is {A, B} a superkey, minimal key, neither, or both?

Question: is {B} a superkey, minimal key, neither, or both?

Question: is {A} a superkey, minimal key, neither or both?

10 of 25

Closure Algorithm

Goal:

We want everything that an attribute/set of attributes determine

Observation:

If we have A -> B and B -> C, then A -> C
So really, A -> B and C
Formal notation is {A}⁺ = {A, B, C}
Since the closure of A is all attributes, A is a key

11 of 25

Conceptual Design

Name	SSN	PhoneNumber	City
Fred	123-45-6789	206-555-1234	Seattle
Fred	123-45-6789	206-555-6543	Seattle
Joe	987-65-4321	908-555-2121	Westfield

SSN Name, City

12 of 25

Conceptual Design

Anomalies:

• Redundancy = repeat data

• Update anomalies = what if Fred moves to “Bellevue”?

• Deletion anomalies = what if Joe deletes his phone number?

13 of 25

Conceptual Design

The BCNF (Boyce-Codd Normal Form) ---- A relation R is in BCNF if every set of attributes is either a superkey or its closure is the same set

14 of 25

BCNF Decomposition Algorithm

Normalize(R)

find X s.t.: X ≠ X⁺ and X⁺≠ [all attributes]

if (not found) then R is in BCNF

let Z = [all attributes] - X⁺

decompose R into R1(X⁺) and R2(X ∪ Z)

Normalize(R1); Normalize(R2);

X +

15 of 25

Example

The relation is R (A, B, C, D, E)

FDs : A → E, BC → A, and DE → B

Question : Decompose R into BCNF.

16 of 25

Solution

Notice that {A}⁺ = {A,E}, which violates the BCNF condition.

We split R to R1(A,E) and R2(A,B,C,D).

R1 satisfies BCNF now, but R2 does not because: {B,C}⁺ = {B,C,A}.

Notice that there is no E in R2 table so we don't need to consider the FD DE → B!

Split R2 to: R21(B,C,A) and R22(B,C,D)

17 of 25

Lossless Decomposition

Consider the relation R(A,B,C,D,E)

FDs: {AB → C, BC → D, AD → E}

S1 = Π_ABC(R), S2 = Π_BCD(R), S3 = Π_ADE(R)

We need to show that R= S1 ⋈ S2 ⋈ S3

S1(ABC)

S2(BCD)

S3(ADE)

18 of 25

Vertical Partitioning

19 of 25

Vertical Partitioning

20 of 25

Vertical Partitioning

SELECT address

FROM Resumes

WHERE name = ‘Sue’

21 of 25

Vertical Partitioning Applications

Advantages

Speeds up queries that touch only a small fraction of columns
Single column can be compressed effectively, reducing disk I/O

Disadvantages

Updates are very expensive!
Need many joins to access many columns
Repeated key columns add overhead

The performance advantages of vertical partitioning come into play if a join uses only the fields in one of the vertically partitioned tables.

If, however, the operation required the entire original table (or at least columns not stored in the same partition), the RDBMS would need to join the partitions into a single table, a relatively slow operation.

• Virtual views – Computed only on-demand – slow at runtime – Always up to date

• Materialized views – Pre-computed offline – fast at runtime – May have stale data (must recompute or update)

• A key component of database performance tuning is the selection of materialized and virtual views

contrast the difference between virtual views and materialized views. ��If virtual then there will be costs in query rewriting, but if the dbms is smart enough then it can optimize the queries on the views as the examples show.��If materialized then query processing is trivial but then update / insert / delete will be troublesome�

22 of 25

Horizontal Partitioning

23 of 25

Horizontal Partitioning

24 of 25

Horizontal Partitioning

25 of 25

Horizontal Partitioning Applications

Performance optimization

Especially for data warehousing
E.g., one partition per month
E.g., archived applications and active applications

Distributed and parallel databases