1 of 46

Describing data using the SQL language.

Ensuring data integrity in SQL. Creating data tables. Working with subqueries. Creating database objects. Data definition language (DDL) commands.

2 of 46

Describing Data using SQL

  • Today's topic: Describing data using the SQL language.
  • We will also cover ensuring data integrity in SQL, creating data tables, working with subqueries, creating database objects, and Data Definition Language (DDL) commands.
  • In previous lectures, we focused on database design concepts.
  • Now, we are transitioning to database implementation using SQL.
  • This lecture builds upon your understanding of relational models and sets the stage for practical database creation and manipulation.

3 of 46

Recap: Relational Model and Tables

  • Question: What is a Relational Model?
    • A data model based on the concept of relations (tables).
    • Organizes data into structured tables with rows and columns.
  • Question: Why do we use tables in databases?
    • Tables provide a clear and organized way to store and manage data.
    • They allow for efficient data retrieval and manipulation through relationships between tables.
  • Key Components of a Relational Table:
    • Table: A collection of data organized into rows and columns. Represents an entity or relationship.
    • Row (Tuple or Record): Represents a single instance of the entity. Each row is unique.
    • Column (Attribute or Field): Represents a characteristic or property of the entity. Each column has a name and a data type.

4 of 46

Recap: SQL Basics

  • SQL: Structured Query Language.
  • The standard language for interacting with relational databases.
  • Used for:
    • Defining database structure (DDL - Data Definition Language).
    • Manipulating data (DML - Data Manipulation Language).
    • Controlling access to data (DCL - Data Control Language).
  • In this lecture, we will focus on using SQL for describing data structure using DDL commands, specifically for creating tables.
  • We will use SQL to define the columns, data types, and constraints for our tables.

5 of 46

Data Definition Language (DDL)

  • Data Definition Language (DDL) is a subset of SQL.
  • Purpose: To define the structure of the database and its objects.
  • DDL statements are used to create, alter, and delete database objects like tables, indexes, views, and users.
  • Think of DDL as the commands that shape the database itself, not the data within it.
  • Examples of DDL statements: CREATE, ALTER, DROP, TRUNCATE, RENAME.

6 of 46

DDL vs. DML

  • DDL (Data Definition Language): Deals with database schemas and descriptions, defining the structure.
  • DML (Data Manipulation Language): Deals with data manipulation; used to store, modify, retrieve, delete, and update data in the database.
  • Analogy:
    • DDL is like designing the architecture of a building (foundation, walls, rooms).
    • DML is like furnishing and decorating the building, moving things around, and adding/removing items.
  • Examples of DML statements: SELECT, INSERT, UPDATE, DELETE.
  • Today's Focus: DDL, specifically creating tables using the CREATE TABLE statement.

7 of 46

Importance of DDL

  • DDL sets the foundation for the entire database.
  • A well-defined database structure (through DDL) is crucial for several reasons:
    • Data Integrity: Ensures data is accurate, consistent, and valid.
    • Efficient Querying: A well-structured database allows for more efficient data retrieval.
    • Application Development: Provides a stable and reliable data structure for building applications.
    • Scalability: A good design allows the database to grow and adapt to changing needs.
    • Maintainability: Easier to manage, modify, and troubleshoot.
  • DDL statements define the schema, data types, constraints, and relationships within the database.

8 of 46

DDL: The Blueprint of Your Database

  • Analogy: DDL is like the blueprint of a house.
    • The blueprint defines the rooms, walls, doors, windows, and their relationships.
    • You cannot build the house correctly (or easily place furniture) without a proper blueprint.
  • Similarly, DDL defines the tables, columns, data types, and relationships in your database.
  • Without a well-defined DDL, you cannot store or retrieve data effectively.
  • The "furniture" in our analogy is the actual data (DML), which we'll cover in later lectures.
  • A good blueprint (DDL) is essential for a strong, functional, and maintainable house (database).

9 of 46

Creating Tables: The CREATE TABLE Statement

  • The fundamental DDL statement for creating tables is CREATE TABLE.
  • This statement defines the structure of a new table in the database.
  • It specifies the table's name, columns, data types for each column, and any constraints.
  • We use CREATE TABLE to build the foundation of our data storage.

10 of 46

CREATE TABLE Syntax - Basic Structure

  • Basic Syntax:

  • Explanation:
    • CREATE TABLE: Keyword indicating the creation of a new table.
    • table_name: The name you choose for your table (follow naming conventions: descriptive, avoid spaces, start with a letter).
    • Parentheses (): Enclose the column definitions.
    • Comma ,: Separates the definitions of different columns.
  • The statement ends with a semicolon ;.

11 of 46

CREATE TABLE Syntax - Components

Inside the parentheses, we define each column:

  • column_name: The name of the column (e.g., student_id, first_name, enrollment_date). Choose meaningful names.
  • data_type: Specifies the type of data the column will store (e.g., INT, VARCHAR, DATE). We'll explore data types in detail shortly.
  • [constraints]: Optional rules that enforce data integrity (e.g., NOT NULL, UNIQUE, PRIMARY KEY). We will also look at this in details soon.
  • Each column definition is separated by a comma.
  • The order of columns in the CREATE TABLE statement defines the order in which they will appear in the table.

12 of 46

CREATE TABLE - Simple Example

  • Let's create a simple table called Employees:

CREATE TABLE Employees (

employee_id INT,

first_name VARCHAR(50),

last_name VARCHAR(50)

);

  • This statement creates a table named Employees.
  • It has three columns:
    • employee_id: Stores integer values.
    • first_name: Stores text strings up to 50 characters long.
    • last_name: Stores text strings up to 50 characters long.
  • Notice the commas separating the column definitions and the semicolon at the end.
  • We have not yet added any constraints in this basic example.

13 of 46

Data Types in SQL: Overview

  • Data types define the kind of values a column can hold.
  • Choosing the correct data type is crucial for:
    • Data Integrity: Ensuring the data stored is valid and consistent.
    • Storage Efficiency: Using the appropriate amount of storage space.
    • Query Performance: Optimizing how quickly data can be retrieved and processed.
  • SQL offers a variety of data types to represent different kinds of information.

14 of 46

Data Types: Integer (INT)

  • INT (or INTEGER): Represents whole numbers (without decimal points).
  • Examples:
    • Student ID: 12345
    • Number of courses: 3
    • Product quantity: 100
  • Can be positive, negative, or zero.
  • There are variations of INT for different ranges of numbers (e.g., TINYINT, SMALLINT, BIGINT), but INT is generally sufficient for most common cases.

15 of 46

Data Types: Variable-Length String (VARCHAR)

  • VARCHAR(size): Represents variable-length strings (text).
  • size: Specifies the maximum number of characters the string can hold.
  • Examples:
    • Student name: VARCHAR(50) (allows names up to 50 characters).
    • Address: VARCHAR(255) (allows addresses up to 255 characters).
    • Product description: VARCHAR(1000)
  • Importance of size:
    • Prevents excessively long strings from being stored, saving storage space.
    • Helps maintain data consistency.
    • If a string exceeds the specified size, it will be truncated (cut off).

16 of 46

Data Types: Fixed-Length String (CHAR)

  • CHAR(size): Represents fixed-length strings (text).
  • size: Specifies the exact number of characters the string must hold.
  • Examples:
    • Country code: CHAR(2) (e.g., "US", "UK", "CA")
    • State abbreviation: CHAR(2) (e.g., "CA", "NY", "TX")
  • Difference from VARCHAR:
    • CHAR always uses the specified amount of storage, even if the string is shorter (padded with spaces).
    • VARCHAR only uses the space needed for the actual string.
  • CHAR is less commonly used than VARCHAR, typically for short, fixed-length codes.

17 of 46

Data Types: Date and Time

  • SQL provides several data types for storing dates and times:
  • DATE: Represents a date (year, month, day). Example: 2024-07-26
  • DATETIME: Represents a date and time. Example: 2024-07-26 10:30:00
  • TIMESTAMP: Similar to DATETIME, but often used for tracking when a record was created or modified (automatically updated).
  • These data types allow for date and time arithmetic (e.g., calculating the difference between two dates).

18 of 46

Data Types: Decimal (DECIMAL / NUMERIC)

  • DECIMAL(precision, scale) or NUMERIC(precision, scale): Represent precise decimal numbers (with a fixed number of digits before and after the decimal point).
  • precision: The total number of digits (both before and after the decimal point).
  • scale: The number of digits after the decimal point.
  • Examples:
    • Grade: DECIMAL(3, 2) (e.g., 95.75, 3 total digits, 2 after the decimal)
    • Price: DECIMAL(10, 2) (e.g., 12345678.99, up to 10 total digits, 2 after the decimal)
    • Currency: DECIMAL(19, 4) (for high-precision financial calculations)
  • Use DECIMAL or NUMERIC when accuracy is critical, especially for financial data.

19 of 46

Data Types: Other Types

SQL supports other data types, including:

  • BOOLEAN: Stores true/false values (often represented as 1 or 0).
  • TEXT: For very long text strings (larger than VARCHAR can handle).
  • BLOB: Binary Large Object - for storing binary data like images, audio, or video files.

The specific data types available and their syntax may vary slightly depending on the database system (MySQL, PostgreSQL, SQL Server, Oracle, etc.). Refer to the documentation for your specific database system for details.

20 of 46

Data Types: Summary

  • Choosing the right data is essential.
  • Consider the range of possible values and the level of precision required.
  • Using an incorrect datatype can lead to errors, data loss, and reduced efficiency.
  • For example, don't store an integer in a VARCHAR column. Use the most appropriate, specific data type for each column.

21 of 46

Column Constraints: Enforcing Data Integrity

  • Constraints are rules enforced on the data within a table's columns.
  • They ensure data accuracy, consistency, and validity.
  • Constraints are defined during table creation (using CREATE TABLE) or can be added later (using ALTER TABLE).
  • They prevent invalid data from being entered into the database.
  • Think of constraints as "guards" that protect the integrity of your data.

22 of 46

Constraint: NOT NULL

  • NOT NULL: Specifies that a column cannot contain a NULL value.
  • A NULL value represents missing or unknown data.
  • Example:

CREATE TABLE Students (

student_id INT NOT NULL,

first_name VARCHAR(50),

last_name VARCHAR(50)

);

  • In this example, student_id is required for every student record. An attempt to insert a student without a student_id will result in an error.
  • Use NOT NULL for columns that are essential for identifying or describing the entity.

23 of 46

Constraint: UNIQUE

  • UNIQUE: Ensures that all values in a column (or a combination of columns) are unique.
  • No two rows can have the same value in a UNIQUE column.
  • Example:

CREATE TABLE Students (

student_id INT,

email VARCHAR(255) UNIQUE,

first_name VARCHAR(50),

last_name VARCHAR(50)

);

  • In this example, each student must have a unique email address.
  • Note: UNIQUE constraint allows NULL values (unless NOT NULL is also specified). Multiple NULL are commonly allowed.

24 of 46

Constraint: PRIMARY KEY

  • PRIMARY KEY: Uniquely identifies each row in a table.
  • It is a combination of NOT NULL and UNIQUE.
  • Only one primary key is allowed per table (but it can consist of multiple columns – a composite key).
  • Example:

CREATE TABLE Students (

student_id INT PRIMARY KEY,

email VARCHAR(255),

first_name VARCHAR(50),

last_name VARCHAR(50)

);

  • student_id is the primary key, guaranteeing each student has a unique ID and that the ID cannot be missing.
  • Importance:
  • Essential for establishing relationships between tables (using foreign keys).
  • Used for indexing, which significantly speeds up data retrieval.

25 of 46

Constraint: FOREIGN KEY

  • FOREIGN KEY: Establishes a relationship between two tables.
  • A foreign key in one table refers to the primary key of another table.
  • Enforces referential integrity: ensures that relationships between tables remain consistent.
  • Example (Conceptual):

CREATE TABLE Courses (

course_id INT PRIMARY KEY,

course_name VARCHAR(100)

);

CREATE TABLE Enrollments (

enrollment_id INT PRIMARY KEY,

student_id INT,

course_id INT,

FOREIGN KEY (student_id) REFERENCES Students(student_id),

FOREIGN KEY (course_id) REFERENCES Courses(course_id)

);

  • Enrollments table has foreign keys referencing the Students and Courses tables. This ensures that you can only enroll existing students in existing courses.

26 of 46

Constraint: CHECK

  • CHECK (condition): Defines a custom rule that the data in a column must satisfy.
  • The condition is a Boolean expression that must evaluate to TRUE for the data to be valid.
  • Example:

CREATE TABLE Students (

student_id INT PRIMARY KEY,

age INT,

CHECK (age >= 18)

);

  • This ensures that the age of a student is always 18 or greater. An attempt to insert a student with an age less than 18 will result in an error.
  • CHECK constraints are powerful for enforcing domain-specific rules.

27 of 46

CREATE TABLE Example: Students Table

  • Putting it all together:

  • This creates a Students table with columns, data types, and constraints.
  • It demonstrates the use of PRIMARY KEY, NOT NULL, UNIQUE, VARCHAR, DATE, DECIMAL, and CHECK.
  • This table provides a well-defined structure for storing student information, enforcing data integrity rules from the outset.

28 of 46

Ensuring Data Integrity

  • Data integrity refers to the accuracy, consistency, and validity of data stored in a database.
  • It means that the data is reliable, trustworthy, and reflects the real-world entities it represents.
  • Maintaining data integrity is crucial throughout the entire lifecycle of a database.

29 of 46

Why is Data Integrity Important?

  • Reliable Decision-Making: Accurate data is essential for making informed decisions, whether in business, research, or any other field.
  • Prevents Errors: Inconsistent or incorrect data can lead to errors in applications, reports, and calculations.
  • Maintains Data Quality: Ensures that data remains accurate and consistent over time, even as the database grows and changes.
  • Application Functionality: Applications rely on consistent and valid data to operate correctly.
  • Legal and Regulatory Compliance: In many industries, maintaining data integrity is a legal or regulatory requirement (e.g., financial data, medical records).

30 of 46

Data Integrity: Real-World Examples

  • Question: Can you think of real-world examples where poor data integrity could cause problems?
  • Possible Examples:
    • Banking: Incorrect account balances, lost transactions.
    • Healthcare: Incorrect patient records, misdiagnosis.
    • E-commerce: Incorrect inventory levels, wrong product shipments.
    • Airline Reservations: Overbooking flights, lost reservations.
    • Scientific Research: Invalid experimental results, incorrect conclusions.

31 of 46

SQL Constraints and Data Integrity

  • SQL constraints are the primary mechanism for enforcing data integrity within a database.
  • The constraints we discussed earlier (NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK) play a direct role in maintaining data integrity.
  • They prevent invalid data from being entered and ensure consistency across the database.

32 of 46

NOT NULL and Data Integrity

  • NOT NULL constraint:
  • Prevents missing required information.
  • Ensures that essential columns always have a value.
  • Example: A customer record must have a customer ID.
  • Violation: Trying to insert a customer without a customer ID would result in an error.

33 of 46

UNIQUE and Data Integrity

  • UNIQUE constraint:
    • Prevents duplicate entries where they shouldn't exist.
    • Ensures that values in a column (or a combination of columns) are unique across all rows.
  • Example: Each customer must have a unique email address.
  • Violation: Trying to insert two customers with the same email address would result in an error.

34 of 46

PRIMARY KEY and Data Integrity (Entity Integrity)

  • PRIMARY KEY constraint:
    • Ensures entity integrity: Each entity (row) in a table is uniquely identifiable.
    • Combines NOT NULL and UNIQUE.
    • Provides a reliable way to reference individual records.
  • Example: Each product in a Products table has a unique product_id.
  • Violation: Trying to insert two products with the same product_id (or no product_id) would result in an error.

35 of 46

FOREIGN KEY and Data Integrity (Referential Integrity)

  • FOREIGN KEY constraint:
    • Enforces referential integrity: Relationships between tables are maintained.
    • Ensures that a foreign key value in one table must correspond to a primary key value in the related table (or be NULL, if allowed).
  • Example: An Order record must reference a valid customer_id from the Customers table.
  • Violation: Trying to insert an order with a non-existent customer_id would result in an error.

36 of 46

CHECK and Data Integrity (Domain Integrity)

  • CHECK constraint:
    • Enforces domain integrity: Data values are within valid ranges or meet specific criteria.
    • Allows defining custom validation rules.
  • Example: A product_price must be greater than zero.
    • CHECK (product_price > 0)
  • Violation: Trying to insert a product with a negative or zero price would result in an error.

37 of 46

Data Integrity: Quick Quiz

  • Scenario: We are creating a table for products.
  • We need to ensure each product has a unique product ID and a name.
  • Question: Which constraints would you use for the product_id and product_name columns?
  • Answer:
    • product_id: PRIMARY KEY (or INT NOT NULL UNIQUE)
    • product_name: NOT NULL (assuming a product must have a name)
  • This ensures uniqueness for IDs and prevents missing names.

38 of 46

Introduction to Subqueries

  • A subquery (also known as an inner query or nested query) is a query embedded within another SQL query (the outer query).
  • It's like a query within a query.
  • Subqueries are used to perform more complex data retrieval and filtering.
  • They allow you to use the results of one query as input to another query.

39 of 46

Purpose of Subqueries

  • Subqueries help break down complex queries into smaller, more manageable parts.
  • They are used when the criteria for selecting data depend on the results of another query.
  • They enhance the expressiveness of SQL, allowing for more sophisticated data analysis.
  • Example: Find all products whose price is higher than the average price of all products. (The average price needs to be calculated first using a subquery.)
  • Today's focus: A basic introduction to subqueries. We'll explore more advanced uses later.

40 of 46

Subquery Syntax: SELECT in WHERE Clause

  • One of the simplest and most common forms of subqueries is using a SELECT statement within the WHERE clause of another SELECT statement.
  • Basic Syntax:

SELECT column1, column2

FROM table_name

WHERE column_name IN (SELECT column_name FROM another_table WHERE condition);

  • Explanation:
    • The inner SELECT query (the subquery) is enclosed in parentheses.
    • The inner query is executed first.
    • The outer query then uses the results of the inner query in its WHERE clause.
    • The IN operator checks if a value exists within the set of values returned by the subquery.

41 of 46

Subquery Execution Order

  • Step 1: Inner Query Execution:
    • The database system first executes the subquery (the query inside the parentheses).
    • This subquery produces a result set (typically a single column of values).
  • Step 2: Outer Query Execution:
    • The database system then executes the outer query.
    • The outer query uses the result set from the subquery in its WHERE clause (or other clauses).
  • The subquery acts as a filter or condition for the outer query.

42 of 46

Subquery Example: Students and Instructors

  • Scenario: "Find all students who are enrolled in courses taught by instructors in the 'Computer Science' department."
  • Conceptual Approach (using a subquery):
  • Inner Query: Find the instructor_id of all instructors in the 'Computer Science' department.

  • Outer Query: Find all students enrolled in courses where the instructor_id is in the result set of the inner query.

43 of 46

Subquery: Complete example

  • Combining the previous example into a single query:

SELECT student_name

FROM Students

WHERE course_id IN (

SELECT course_id

FROM Courses

WHERE instructor_id IN (

SELECT instructor_id

FROM Instructors

WHERE department = 'Computer Science'

)

);

  • The innermost query finds instructor IDs in Computer Science.
  • The next query finds course IDs taught by those instructors.
  • The outermost query finds student names enrolled in those courses.

44 of 46

Subqueries: Beyond the Basics

  • Subqueries can be more complex than the simple example we've seen:
    • Subqueries in FROM clause: Used to create temporary, derived tables.
    • Subqueries in SELECT clause: Used to retrieve single values for each row.
    • Correlated Subqueries: The inner query depends on the outer query and is executed for each row of the outer query.
  • These more advanced types of subqueries will be covered in future lectures.

45 of 46

Database Objects: Tables

  • Tables are the fundamental database objects.
  • They store the actual data in a structured format (rows and columns).
  • We've spent most of this lecture focusing on creating tables using CREATE TABLE.
  • However, "database object" is a broader term.

46 of 46

Summary

  • Data Definition Language (DDL): The part of SQL used to define database structure.
  • CREATE TABLE statement: How to create tables, define columns, specify data types, and apply constraints.
  • Data Types: INT, VARCHAR, CHAR, DATE, DATETIME, DECIMAL, etc.
  • Constraints: NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK – used to enforce data integrity.
  • Data Integrity: The accuracy, consistency, and validity of data.
  • Subqueries (Introduction): Queries nested within other queries.