CS 186 Exam Prep Section 2:
Disks and Files
Disks & Files
Files, Pages, Records
Page Basics:
The page header keeps track of the records in the page.
The page header may contain fields such as:
Fixed Length Records
Fixed length records are when record lengths are fixed and field lengths are consistent
Packed Records: no gaps between records, record ID is location in page
Unpacked Records: allow gaps between records, use a bitmap to keep track of where the gaps are
Variable Length Records
Variable length records are when either record lengths are not fixed or field lengths are not consistent
We can store variable length length records with an array of field offsets:
Variable Length Records: Slotted Pages
Heap Files and Sorted Files
A heap file is just a file with no order enforced.
A sorted file is similar to a heap file, except we require it be sorted on a key (a subset of the fields).
Heap Files: List Implementation
One approach to implementing a heap file is as a list.
Heap Files: Page Directory Implementation
A different approach to implementing a heap file is with a page directory.
We have a linked list of header pages, which are each responsible for a set of data pages
Operations: Heap Files vs Sorted Files
Operations: Heap Files vs Sorted Files
Worksheet
Question 1
Question 1a
As the records are variable length, we will need a record header in the record. How big is the record header? You may assume pointers are 4 bytes long, and that the record header only contains pointers.
Header
collar_id
age
name
color
Question 1a
As the records are variable length, we will need a record header in the record. How big is the record header? You may assume pointers are 4 bytes long, and that the record header only contains pointers.
8 bytes. In the record header, we need one pointer for each variable length value. In this schema, those are just the two VARCHARs, so we need 2 pointers, each 4 bytes.
Question 1b
Including the record header, what is the smallest possible record size (in bytes) in this schema?
Header
collar_id
age
name
color
Question 1b
Including the record header, what is the smallest possible record size (in bytes) in this schema?
Header
collar_id
age
Question 1b
Including the record header, what is the smallest possible record size (in bytes) in this schema?
16 bytes (= 8 + 4 + 4 + 0 + 0). 8 for the record header, 4 for each of integers, and 0 for each of the VARCHARs.
Question 1c
Including the record header, what is the largest possible record size (in bytes) in this schema?
Header
collar_id
age
name
color
Question 1c
Including the record header, what is the largest possible record size (in bytes) in this schema?
46 bytes (= 8 + 4 + 4 + 20 + 10)
Question 1d
Suppose we are storing these records using a slotted page layout with variable length records. The page footer contains an integer storing the record count and a pointer to free space, as well as a slot directory storing, for each record, a pointer and length. What is the maximum number of records that we can fit on a 8KB page?
Record 1
Record 2
Record 3
Record 4
Record 5
Footer
20
24
30
28
30
20
Record Count
Free space pointer
Question 1d
Suppose we are storing these records using a slotted page layout with variable length records. The page footer contains an integer storing the record count and a pointer to free space, as well as a slot directory storing, for each record, a pointer and length. What is the maximum number of records that we can fit on a 8KB page?
We start out with 8 * 1024 = 8192 bytes of space on the page. We subtract 4 bytes that are used for the record count, and another 4 for the pointer to free space.
This leaves us with 8192 - 4 - 4 bytes that we can use to store records and their slots.
A record takes up 16 bytes of space at minimum (from the previous questions), and for each record we also need to store a slot with a pointer (4 bytes) and a length (4 bytes). Thus, we need 16 + 4 + 4 bytes of space for each record and its slot. We divide the available number of bytes by the number of bytes per record to get the maximum value of 341 records.
341 records (= (8192 - 4 - 4) / (16 + 4 + 4))
Question 1e
Suppose we stored the maximum number of records on a page, and then deleted one record. Now we want to insert another record. Are we guaranteed to be able to do this? Explain why or why not.
No, we deleted 16 bytes but the record we want to insert may be up to 46 bytes.
Question 1f
Now suppose we deleted 3 records. Without reorganizing any of the records on the page, we would like to insert another record. Are we guaranteed to be able to do this? Explain why or why not.
No; there are 48 free bytes but they may be fragmented - there might not be 46 contiguous bytes.
Question 2
Question 2a
Are Student records represented as fixed or variable length?
Fixed. There are only fixed-length fields in the schema (integers) so the record will be fixed length.
Question 2b
To store these records, we will use an unpacked representation with a page header. This page header will contain nothing but a bitmap, rounded up to the nearest byte. How many records can we fit on a 4KB page? (Recall that one KB is 1024 bytes.)
Question 2b
To store these records, we will use an unpacked representation with a page header. This page header will contain nothing but a bitmap, rounded up to the nearest byte. How many records can we fit on a 4KB page? (Recall that one KB is 1024 bytes.)
337 records
We start with 4 * 1024 = 4096 bytes on the page. Next, our fixed-length record is made up of 3 integers for a total of 4 + 4 + 4 = 12 bytes. Lastly, we need one bit (⅛ of a byte) in the bitmap for each record. When we divide 4096/12.125, we get a bit over 337 records. It’s important to note that the bitmap size must be rounded to the nearest byte, so with 337 records, the bitmap must be 344 bits (closest multiple of 8 bits), or 43 bytes. This makes the total number of bytes used 337 * 12 + 43 = 4087, which fits within our page.
Question 2c
Suppose there are 7 pages worth of records. We would like to execute:�SELECT * FROM Student WHERE student_id = 3034213355
Suppose these pages are stored in a heap file implemented as a linked list. What is the minimum and maximum number of I/O’s required to answer the query?
Question 2c
Suppose there are 7 pages worth of records. We would like to execute:�SELECT * FROM Student WHERE student_id = 3034213355
Suppose these pages are stored in a heap file implemented as a linked list. What is the minimum and maximum number of I/O’s required to answer the query?
Minimum: 2, Maximum: 8
First we need to read the header page. In the best case, we find the record on the first page we search, for a total of 2 I/O’s (header page + 1 data page). In the worst case, we find the record on the last page we search, for a total of 8 I/O’s (header page + 7 data pages)
Question 2d
Now suppose these pages are stored in a sorted file, sorted on student id. What is the minimum and maximum number of I/O’s required to answer the query? You can assume sorted files do not have header pages.
Question 2d
Now suppose these pages are stored in a sorted file, sorted on student id. What is the minimum and maximum number of I/O’s required to answer the query? You can assume sorted files do not have header pages.
Minimum: 1, Maximum: 3
We use binary search to find records in a sorted file.
Question 3
Question 3a
Suppose we are storing variable length records in a linked list heap file. In the ”pages with space” list, suppose there happens to be 5 pages. What is the maximum number of page IOs required in order to insert a record?
You may assume that at least one of these pages contains enough space, and additionally that it will not become full after insertion.
Question 3a
Suppose we are storing variable length records in a linked list heap file. In the ”pages with space” list, suppose there happens to be 5 pages. What is the maximum number of page IOs required in order to insert a record?
You may assume that at least one of these pages contains enough space, and additionally that it will not become full after insertion.
7 I/O’s = 1 + 5 + 1
First, we read the header page (1 I/O). Next, in the worst case, we read all 5 data pages (5 I/O’s). Lastly, we insert the record on the final data page (1 I/O)
Question 3b
Continuing from part (a), suppose that the page does become full after insertion. Now, we need to move that page to the ”full pages” list.
Assume we have already done all necessary page reads for part (a)’s worst case (and that those pages are still in memory), but have not yet done any page writes.
How many additional page I/Os do we need to move the page to the ”full pages” list?
Question 3b
Continuing from part (a), suppose that the page does become full after insertion. Now, we need to move that page to the ”full pages” list.
Assume we have already done all necessary page reads for part (a)’s worst case (and that those pages are still in memory), but have not yet done any page writes.
How many additional page I/Os do we need to move the page to the ”full pages” list?
5 Additional I/O’s
First, we insert the record into the final data page, and update its “previous” to point to the header page, and its “next” pointer to point to the first page in the “full pages” list (1 I/O). Next, we update the “next” pointer of the second-to-last data page to no longer point to the final record (1 I/O). Then, we read the first record of the “full pages” list, and update its “previous” pointer to point to the page that just became full (1 Read + 1 Write = 2 I/O’s). Lastly, we update the header page’s “next” pointer to point to the page that just became full (1 I/O because we already read the header into the cache in part a). 1 + 1 + 2 + 1 = 5 I/O’s
Question 3c
Now suppose records are fixed length; what is the maximum number of page I/Os to insert a record? Assume that the page we insert into does not fill up after the insertion.
Question 3c
Now suppose records are fixed length; what is the maximum number of page I/Os to insert a record? Assume that the page we insert into does not fill up after the insertion.
3 I/O’s
First we read the header page (1 I/O). Next, we read the first non-full page (1 I/O). Because the record is fixed-length, all pages in the “pages with space” list have space for at least one record, so we do not need to read more than one. Lastly, we insert the record (1 I/O). 1 + 1 + 1 = 3 I/O’s
Question 3d
Now suppose we are using a page directory, with one directory page. What is the maximum number of I/Os required to insert a record?
4 I/O’s
First, we read the page directory (1 I/O). Next, we read a data page that the page directory said had enough space for our record (1 I/O). Then, we insert the record into the data page (1 I/O). Lastly, we update the page directory with the new amount of free space on the data page (1 I/O).
Attendance Link