1 of 42

CS 186 Exam Prep Section 2:

Disks and Files

2 of 42

Disks & Files

3 of 42

Files, Pages, Records

  • Tables stored as logical files consisting of pages each containing a collection of records
  • File (corresponds to a table)
    • Page (many per file)
      • Record (many per page)
  • The unit of access to physical disk is the page
    • 1 I/O = read or write 1 page

4 of 42

Page Basics:

The page header keeps track of the records in the page.

The page header may contain fields such as:

  • Number of records in the page
  • Pointer to segment of free space in the page
  • Bitmap indicating which parts of the page are in use

5 of 42

Fixed Length Records

Fixed length records are when record lengths are fixed and field lengths are consistent

Packed Records: no gaps between records, record ID is location in page

Unpacked Records: allow gaps between records, use a bitmap to keep track of where the gaps are

6 of 42

Variable Length Records

Variable length records are when either record lengths are not fixed or field lengths are not consistent

We can store variable length length records with an array of field offsets:

  • Each record contains a record header
  • Variable length fields are placed after fixed length fields
  • Record header stores field offset (where variable length field ends)

7 of 42

Variable Length Records: Slotted Pages

  • Move page header to end of page (footer) - to allow for growth
  • Store length and pointer to start of each record in footer
  • Store a pointer to free space
  • Deleting records may cause fragmentation if we use an unpacked layout

8 of 42

Heap Files and Sorted Files

A heap file is just a file with no order enforced.

  • Within a heap file, we keep track of pages
    • Within a page, keep track of records (and free space)
      • Records placed arbitrarily across pages

A sorted file is similar to a heap file, except we require it be sorted on a key (a subset of the fields).

  • Implemented using page directories, data pages ordered based on how records are sorted
    • We can use binary search on the key the file is sorted on to find records!

9 of 42

Heap Files: List Implementation

One approach to implementing a heap file is as a list.

  • Each page has two pointers (previous and next pages), free space, and data.
  • We have two linked lists of pages, both connected to a header page
    • List of full pages
    • List of pages with some empty space

10 of 42

Heap Files: Page Directory Implementation

A different approach to implementing a heap file is with a page directory.

We have a linked list of header pages, which are each responsible for a set of data pages

  • Stores the amount of free space per data page

11 of 42

Operations: Heap Files vs Sorted Files

  • Scan: read all of the pages of the file
    • Same for heap files and sorted files
  • Equality Search: find all records in the file matching a certain key
    • Heap files: complicated equation, but expect to touch about half the pages in the file
    • Sorted files: do binary search to find the key
  • Range Search: find all records in the file whose key is in a given range
    • Heap files: scan whole file
    • Sorted files: do binary search to find the lower bound key, then scan for all records with keys in the range
  • Insert: add a record to the file
    • Heap files: add it to the end of a page with free space (or write a new page if all pages are full)
    • Sorted files: do binary search to find where record should go, then shift all of the records after it
  • Delete: remove a record from the file
    • Heap files: touch about half of the pages in the file to find the record (see equality search), then write to the page to remove
    • Sorted files: do binary search to find the record, remove, then shift all of the records after it

12 of 42

Operations: Heap Files vs Sorted Files

13 of 42

Worksheet

14 of 42

Question 1

15 of 42

Question 1a

As the records are variable length, we will need a record header in the record. How big is the record header? You may assume pointers are 4 bytes long, and that the record header only contains pointers.

Header

collar_id

age

name

color

16 of 42

Question 1a

As the records are variable length, we will need a record header in the record. How big is the record header? You may assume pointers are 4 bytes long, and that the record header only contains pointers.

8 bytes. In the record header, we need one pointer for each variable length value. In this schema, those are just the two VARCHARs, so we need 2 pointers, each 4 bytes.

17 of 42

Question 1b

Including the record header, what is the smallest possible record size (in bytes) in this schema?

Header

collar_id

age

name

color

18 of 42

Question 1b

Including the record header, what is the smallest possible record size (in bytes) in this schema?

Header

collar_id

age

19 of 42

Question 1b

Including the record header, what is the smallest possible record size (in bytes) in this schema?

16 bytes (= 8 + 4 + 4 + 0 + 0). 8 for the record header, 4 for each of integers, and 0 for each of the VARCHARs.

20 of 42

Question 1c

Including the record header, what is the largest possible record size (in bytes) in this schema?

Header

collar_id

age

name

color

21 of 42

Question 1c

Including the record header, what is the largest possible record size (in bytes) in this schema?

46 bytes (= 8 + 4 + 4 + 20 + 10)

22 of 42

Question 1d

Suppose we are storing these records using a slotted page layout with variable length records. The page footer contains an integer storing the record count and a pointer to free space, as well as a slot directory storing, for each record, a pointer and length. What is the maximum number of records that we can fit on a 8KB page?

Record 1

Record 2

Record 3

Record 4

Record 5

Footer

20

24

30

28

30

20

Record Count

Free space pointer

23 of 42

Question 1d

Suppose we are storing these records using a slotted page layout with variable length records. The page footer contains an integer storing the record count and a pointer to free space, as well as a slot directory storing, for each record, a pointer and length. What is the maximum number of records that we can fit on a 8KB page?

We start out with 8 * 1024 = 8192 bytes of space on the page. We subtract 4 bytes that are used for the record count, and another 4 for the pointer to free space.

This leaves us with 8192 - 4 - 4 bytes that we can use to store records and their slots.

A record takes up 16 bytes of space at minimum (from the previous questions), and for each record we also need to store a slot with a pointer (4 bytes) and a length (4 bytes). Thus, we need 16 + 4 + 4 bytes of space for each record and its slot. We divide the available number of bytes by the number of bytes per record to get the maximum value of 341 records.

341 records (= (8192 - 4 - 4) / (16 + 4 + 4))

24 of 42

Question 1e

Suppose we stored the maximum number of records on a page, and then deleted one record. Now we want to insert another record. Are we guaranteed to be able to do this? Explain why or why not.

No, we deleted 16 bytes but the record we want to insert may be up to 46 bytes.

25 of 42

Question 1f

Now suppose we deleted 3 records. Without reorganizing any of the records on the page, we would like to insert another record. Are we guaranteed to be able to do this? Explain why or why not.

No; there are 48 free bytes but they may be fragmented - there might not be 46 contiguous bytes.

26 of 42

Question 2

27 of 42

Question 2a

Are Student records represented as fixed or variable length?

Fixed. There are only fixed-length fields in the schema (integers) so the record will be fixed length.

28 of 42

Question 2b

To store these records, we will use an unpacked representation with a page header. This page header will contain nothing but a bitmap, rounded up to the nearest byte. How many records can we fit on a 4KB page? (Recall that one KB is 1024 bytes.)

29 of 42

Question 2b

To store these records, we will use an unpacked representation with a page header. This page header will contain nothing but a bitmap, rounded up to the nearest byte. How many records can we fit on a 4KB page? (Recall that one KB is 1024 bytes.)

337 records

We start with 4 * 1024 = 4096 bytes on the page. Next, our fixed-length record is made up of 3 integers for a total of 4 + 4 + 4 = 12 bytes. Lastly, we need one bit (⅛ of a byte) in the bitmap for each record. When we divide 4096/12.125, we get a bit over 337 records. It’s important to note that the bitmap size must be rounded to the nearest byte, so with 337 records, the bitmap must be 344 bits (closest multiple of 8 bits), or 43 bytes. This makes the total number of bytes used 337 * 12 + 43 = 4087, which fits within our page.

30 of 42

Question 2c

Suppose there are 7 pages worth of records. We would like to execute:�SELECT * FROM Student WHERE student_id = 3034213355

Suppose these pages are stored in a heap file implemented as a linked list. What is the minimum and maximum number of I/O’s required to answer the query?

31 of 42

Question 2c

Suppose there are 7 pages worth of records. We would like to execute:�SELECT * FROM Student WHERE student_id = 3034213355

Suppose these pages are stored in a heap file implemented as a linked list. What is the minimum and maximum number of I/O’s required to answer the query?

Minimum: 2, Maximum: 8

First we need to read the header page. In the best case, we find the record on the first page we search, for a total of 2 I/O’s (header page + 1 data page). In the worst case, we find the record on the last page we search, for a total of 8 I/O’s (header page + 7 data pages)

32 of 42

Question 2d

Now suppose these pages are stored in a sorted file, sorted on student id. What is the minimum and maximum number of I/O’s required to answer the query? You can assume sorted files do not have header pages.

33 of 42

Question 2d

Now suppose these pages are stored in a sorted file, sorted on student id. What is the minimum and maximum number of I/O’s required to answer the query? You can assume sorted files do not have header pages.

Minimum: 1, Maximum: 3

We use binary search to find records in a sorted file.

34 of 42

Question 3

35 of 42

Question 3a

Suppose we are storing variable length records in a linked list heap file. In the ”pages with space” list, suppose there happens to be 5 pages. What is the maximum number of page IOs required in order to insert a record?

You may assume that at least one of these pages contains enough space, and additionally that it will not become full after insertion.

36 of 42

Question 3a

Suppose we are storing variable length records in a linked list heap file. In the ”pages with space” list, suppose there happens to be 5 pages. What is the maximum number of page IOs required in order to insert a record?

You may assume that at least one of these pages contains enough space, and additionally that it will not become full after insertion.

7 I/O’s = 1 + 5 + 1

First, we read the header page (1 I/O). Next, in the worst case, we read all 5 data pages (5 I/O’s). Lastly, we insert the record on the final data page (1 I/O)

37 of 42

Question 3b

Continuing from part (a), suppose that the page does become full after insertion. Now, we need to move that page to the ”full pages” list.

Assume we have already done all necessary page reads for part (a)’s worst case (and that those pages are still in memory), but have not yet done any page writes.

How many additional page I/Os do we need to move the page to the ”full pages” list?

38 of 42

Question 3b

Continuing from part (a), suppose that the page does become full after insertion. Now, we need to move that page to the ”full pages” list.

Assume we have already done all necessary page reads for part (a)’s worst case (and that those pages are still in memory), but have not yet done any page writes.

How many additional page I/Os do we need to move the page to the ”full pages” list?

5 Additional I/O’s

First, we insert the record into the final data page, and update its “previous” to point to the header page, and its “next” pointer to point to the first page in the “full pages” list (1 I/O). Next, we update the “next” pointer of the second-to-last data page to no longer point to the final record (1 I/O). Then, we read the first record of the “full pages” list, and update its “previous” pointer to point to the page that just became full (1 Read + 1 Write = 2 I/O’s). Lastly, we update the header page’s “next” pointer to point to the page that just became full (1 I/O because we already read the header into the cache in part a). 1 + 1 + 2 + 1 = 5 I/O’s

39 of 42

Question 3c

Now suppose records are fixed length; what is the maximum number of page I/Os to insert a record? Assume that the page we insert into does not fill up after the insertion.

40 of 42

Question 3c

Now suppose records are fixed length; what is the maximum number of page I/Os to insert a record? Assume that the page we insert into does not fill up after the insertion.

3 I/O’s

First we read the header page (1 I/O). Next, we read the first non-full page (1 I/O). Because the record is fixed-length, all pages in the “pages with space” list have space for at least one record, so we do not need to read more than one. Lastly, we insert the record (1 I/O). 1 + 1 + 1 = 3 I/O’s

41 of 42

Question 3d

Now suppose we are using a page directory, with one directory page. What is the maximum number of I/Os required to insert a record?

4 I/O’s

First, we read the page directory (1 I/O). Next, we read a data page that the page directory said had enough space for our record (1 I/O). Then, we insert the record into the data page (1 I/O). Lastly, we update the page directory with the new amount of free space on the data page (1 I/O).

42 of 42

Attendance Link