1 of 28

Designing a Relational Database - Part 1

Mark Fontenot, PhD

Northeastern University

2 of 28

What we’ve done so far…

  • Given an implemented relational database:
    • Retrieve information from it (SELECT)
    • Add new data (INSERT)
    • Change data (UPDATE)
    • Remove data (DELETE)
  • Given a structure for a database table, implement the DDL statements to create the table and constraints to ensure integrity of data

3 of 28

Hypothetical #1

You’re hired for your next co-op and the project is to build a piece of software for internal use to help manage the companies many co-ops and interns (126 this year alone). The company needs to keep track of employee information, supervisor info, projects they’re working on, etc. as well as output from goal setting, check-ins, exit interviews, and surveys.

Ready… Set….. Go!

Wait… what do you do? 🤔

4 of 28

Hypothetical #2

You just got your first job as a data scientist and your first project is to work with a developer on updating their database model to support real-time monitoring and analytics of customer behavior for an upcoming dynamic pricing model the CEO wants to implement.

Ready… Set….. Go!

Wait… what do you do? 🤔

5 of 28

DATA MODELING

5

6 of 28

Some Review

  • Database: collection of interrelated data
    • serves the needs of multiple users
    • multiple different types of data
  • RDBMS: software system for managing a relational database
    • supports DML, DDL, transaction mgmt and concurrency control, data integrity, crash recovery, security, etc.

6

7 of 28

Data Independence

7

Physical Model

Logical Model

User Interface

Physical Data Independence

Logical Data Independence

Basic Idea: We want to keep the storage independent from the usage of the data

8 of 28

Data Independence

  • Physical Data Independence: Changes to the physical level of the system (how data is stored) must not require a change to the application structure
  • Logical Data Independence: Changes to the logical level (tables, columns, etc.) must not require a change to an application based on the structure.
    • more difficult to achieve than physical data independence, but not impossible

8

9 of 28

Database Design Lifecycle

9

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/Maintenance

10 of 28

Requirements

  • Typical System Requirements Analysis
  • For a new product, think about what data is needed to support implementation of various MVP features
  • For existing/manual system, talk to the users, review current system
    • Understand how they use the current “system”
    • What workflows are in place?
    • What reports do they regularly generate?

10

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

11 of 28

Abstraction in Database Design

External Schema 1

External Schema 2

External Schema 3

Conceptual Database Schema

Physical Database Schema

Disk

12 of 28

Logical Design

  • Conceptually understand the data requirements and usage, and produce (possibly multiple) Conceptual Data Model(s).
    • Think about the data needs of each archetypical user (aka. user personas)
    • Analysis of the data needs of each persona can result in a localized conceptual data model

12

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

13 of 28

Localized Data Models - Visual Example

Customers

Orders

Products

Sales Person

Products

Orders

Suppliers

Warehouse Worker

Products

Orders

Suppliers

Global

Customers

14 of 28

Logical Design

  • How do you arrive at the localized conceptual models?
    • Requirements analysis…
    • Critically examining current systems for each persona
    • Critically review/reverse engineer any paper-based systems �
    • Contextual Inquiry - observing individuals in their normal environments and asking them questions as they perform their regular duties.

14

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

15 of 28

Physical Design

  • Purpose: optimize performance
  • Considerations:
    • indexing
    • CAP Principle
      • Consistency
      • Availability
      • Partition tolerance

  • Joins are expensive… Do we need to combine tables to reduce the server load?

15

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

16 of 28

Physical Design

  • Disk-based storage is SLOW…

16

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

17 of 28

Implementation/Maintenance

  • Make it happen
  • DDL/DML implementation
  • Throughout operational life, monitor performance
    • Is it as performant as needed? Are users having to wait?

17

Requirements Analysis

Logical Model Design

Physical Model Design

Implementation/

Maintenance

18 of 28

ENTITY-RELATIONSHIP MODEL

18

19 of 28

Definitions (some review)

  • Entity - represent physical or conceptual “things” about which we are storing data
    • examples: Employees, Department, Performance Reviews, Product Defect Lists, etc.
    • Technically, described as an entity type (the schema; description of the data).
    • Entity set is a collection of instances of an entity type (the actual data)
  • Keys
    • Super Key - set of attributes whose values are unique for every element of an entity set.
    • Candidate Key - set of attributes whose values are unique for every element of an entity set AND for which no subset of attributes also determines uniqueness
    • Primary Key - the candidate key selected by the database designer as the attribute or set of attributes to serve as unique identifier for each element of an entity set.
    • Surrogate Primary Key - a single, small attribute chosen for unique identification but that doesn’t actually describe a characteristic of the entity type.
      • Example might be your NUID number, an order number, or a product ID number.

20 of 28

Definitions (some review)

  • Attribute - a characteristic of an entity that we are actually storing
    • examples: employee id #, perf. review date, etc.
    • composite attribute: attribute can be decomposed into smaller attributes … think mailing address.
    • multivalued attribute: more than one value allowed for one instance of the entity… think how many email addresses you have.
    • derived attribute: value of attribute can be determined from value of another attribute … think age as derived from birth date.
  • Relationship between two entities when one of entity is referring to information stored in the other entity.
    • Degree of Relationship:
      • when two entities are involved → Binary Relationship
      • can technically involve more than two entities.

21 of 28

Entities

21

Customer

Video

Rents

cust-id

cust-name

video-id

copy-no

title

due-date

N

N

Modified Chen Diagramming Technique

22 of 28

Attributes

22

Customer

Video

Rents

cust-id

cust-name

video-id

copy-no

title

due-date

N

N

Attributes - characteristics of the entities that provide detail about them.

Modified Chen Diagramming Technique

23 of 28

Relationships

23

Customer

Video

Rents

cust-id

cust-name

video-id

copy-no

title

due-date

N

N

Modified Chen Diagramming Technique

24 of 28

Multivalued Attributes

24

Customer

cust-id

cust-name

N

. . .

phone

Professor

idNum

name

N

. . .

degrees

Modified Chen Diagramming Technique

25 of 28

Composite Attributes

25

Customer

cust-id

cust-name

N

. . .

address

zip

city

street

state

Modified Chen Diagramming Technique

26 of 28

Relationship Degree

26

Employee

manages

N

1

manager

subordinate

Recursive

Department

is sub-unit

Division

Binary

Employee

uses

Project

Skill

Ternary

N

1

N

N

N

Modified Chen Diagramming Technique

27 of 28

Relationship Cardinality

27

Department

Employee

is managed by

1

1

Department

Employee

has

1

M

Employee

Project

works on

M

N

1:1

1:M

M:N

Modified Chen Diagramming Technique

28 of 28

Relationship Participation

28

Department

Employee

is managed by

1

1

Dependent

Employee

Dependent

has

Mandatory

Optional

Modified Chen Diagramming Technique