1 of 50

Chapter 12: File System Implementation

Operating System Concepts – 9^th Edition

12.1

Operating System Concepts – 9^th Edition

2 of 50

Chapter 12: File System Implementation

File-System Structure
File-System Implementation
Directory Implementation
Allocation Methods
Free-Space Management
Efficiency and Performance
Recovery

12.2

Operating System Concepts – 9^th Edition

3 of 50

File-System Structure

Disks provide most of the secondary storage on which file systems are maintained. Two characteristics make them convenient for this purpose:

1. A disk can be rewritten in place; it is possible to read a block from the disk, modify the block, and write it back into the same place.

2. A disk can access directly any block of information it contains. Thus, it is simple to access any file either sequentially or randomly, and switching from one file to another requires only moving the read–write heads and waiting for the disk to rotate.

To improve I/O efficiency, I/O transfers between memory and disk are performed in units of blocks. Each block has one or more sectors.

12.3

Operating System Concepts – 9^th Edition

4 of 50

File-System Structure…

Depending on the disk drive, sector size varies from 32 bytes to 4,096 bytes; the usual size is 512 bytes.
File systems provide efficient and convenient access to the disk by allowing data to be stored, located, and retrieved easily.
A file system poses two quite different design problems.
The first problem is defining how the file system should look to the user. This task involves defining a file and its attributes, the operations allowed on a file, and the directory structure for organizing files.
The second problem is creating algorithms and data structures to map the logical file system onto the physical secondary-storage devices.
The file system itself is generally composed of many different levels. The structure shown in Figure is an example of a layered design.

12.4

Operating System Concepts – 9^th Edition

5 of 50

File-System Structure..

Each level in the design uses the features of lower levels to create new features for use by higher levels.
The I/O control level consists of device drivers and interrupt handlers to transfer information between the main memory and the disk system.
A device driver can be thought of as a translator. Its input consists of high level commands such as “retrieve block 123.” Its output consists of low-level, hardware-specific instructions that are used by the hardware controller, which interfaces the I/O device to the rest of the system.
The device driver usually writes specific bit patterns to special locations in the I/O controller’s memory to tell the controller which device location to act on and what actions to take.

12.5

Operating System Concepts – 9^th Edition

6 of 50

Layered File System

12.6

Operating System Concepts – 9^th Edition

7 of 50

File-System Structure

The basic file system needs only to issue generic commands to the appropriate device driver to read and write physical blocks on the disk. Each physical block is identified by its numeric disk address.
This layer also manages the memory buffers and caches that hold various file-system, directory, and data blocks.
The file-organization module knows about files and their logical blocks, as well as physical blocks. By knowing the type of file allocation used and the location of the file, the file-organization module can translate logical block addresses to physical block addresses for the basic file system to transfer.
A block in the buffer is allocated before the transfer of a disk block can occur. When the buffer is full, the buffer manager must find more buffer memory or free

12.7

Operating System Concepts – 9^th Edition

8 of 50

Finally, the logical file system manages metadata information. Metadata includes all of the file-system structure except the actual data (or contents of the files). The logical file system manages the directory structure to provide the file-organization module with the information the latter needs, given a symbolic file name. It maintains file structure via file-control blocks. A file control block (FCB) contains information about the file, including ownership, permissions, and location of the file contents.
When a layered structure is used for file-system implementation, duplication of code is minimized.
The use of layering, including the decision about how many layers to use and what each layer should do, is a major challenge in designing new systems.

File-System Structure

12.8

Operating System Concepts – 9^th Edition

9 of 50

File-System Implementation

Operating systems implement open() and close() systems calls for processes to request access to file contents.
Several on-disk and in-memory structures are used to implement a file system. These structures vary depending on the operating system and the file system, but some general principles apply.
On disk, the file system may contain information about how to boot an operating system stored there, the total number of blocks, the number and location of free blocks, the directory structure, and individual files.
Here, we describe them briefly:
A boot control block (per volume) can contain information needed by the system to boot an operating system from that volume. If the disk does not contain an operating system, this block can be empty. It is typically the first block of a volume. In UFS, it is called the boot block. In NTFS, it is the partition boot sector.

12.9

Operating System Concepts – 9^th Edition

10 of 50

File-System Implementation

A volume control block (per volume) contains volume (or partition) details, such as the number of blocks in the partition, the size of the blocks, a free-block count and free-block pointers, and a free-FCB count and FCB pointers. In UFS, this is called a superblock. In NTFS, it is stored in the master file table.
A directory structure (per file system) is used to organize the files. In UFS, this includes file names and associated inode numbers. In NTFS, it is stored in the master file table.
A per-file FCB contains many details about the file. It has a unique identifier number to allow association with a directory entry. In NTFS, this information is actually stored within the master file table, which uses a relational database structure, with a row per file.
The in-memory information is used for both file-system management and performance improvement via caching. The data are loaded at mount time, updated during file-system operations, and discarded at dismount.
Several types of structures may be included.

12.10

Operating System Concepts – 9^th Edition

11 of 50

File-System Implementation

An in-memory mount table contains information about each mounted volume.
An in-memory directory-structure cache holds the directory information of recently accessed directories. (For directories at which volumes are mounted, it can contain a pointer to the volume table.)
The system-wide open-file table contains a copy of the FCB of each open file, as well as other information.
The per-process open-file table contains a pointer to the appropriate entry in the system-wide open-file table, as well as other information.
Buffers hold file-system blocks when they are being read from disk or written to disk.
To create a new file, an application program calls the logical file system. The logical file system knows the format of the directory structures. To create a new file, it allocates a new FCB.

12.11

Operating System Concepts – 9^th Edition

12 of 50

File-System Implementation (Cont.)

The system then reads the appropriate directory into memory, updates it with the new file name and FCB, and writes it back to the disk. A typical FCB is shown in Figure 12.2.

Figure 12.2 A typical file-control block.

12.12

Operating System Concepts – 9^th Edition

13 of 50

In-Memory File System Structures

Mount table storing file system mounts, mount points, file system types
The following figure illustrates the necessary file system structures provided by the operating systems
Figure 12-3(a) refers to opening a file
Figure 12-3(b) refers to reading a file
Plus buffers hold data blocks from secondary storage
Open returns a file handle for subsequent use
Data from read eventually copied to specified user process memory address

12.13

Operating System Concepts – 9^th Edition

14 of 50

In-Memory File System Structures

12.14

Operating System Concepts – 9^th Edition

15 of 50

Partitions and Mounting

Partition can be a volume containing a file system (“cooked”) or raw – just a sequence of blocks with no file system
Boot block can point to boot volume or boot loader set of blocks that contain enough code to know how to load the kernel from the file system

Or a boot management program for multi-os booting

Root partition contains the OS, other partitions can hold other OS’s, other file systems, or be raw

Mounted at boot time
Other partitions can mount automatically or manually

At mount time, file system consistency checked

Is all metadata correct?

If not, fix it, try again
If yes, add to mount table, allow access

12.15

Operating System Concepts – 9^th Edition

16 of 50

Virtual File Systems

Virtual File Systems (VFS) on Unix provide an object-oriented way of implementing file systems
VFS allows the same system call interface (the API) to be used for different types of file systems

Separates file-system generic operations from implementation details
Implementation can be one of many file systems types, or network file system

Implements vnodes which hold inodes or network file details

Then dispatches operation to appropriate file system implementation routines

12.16

Operating System Concepts – 9^th Edition

17 of 50

Virtual File Systems (Cont.)

The API is to the VFS interface, rather than any specific type of file system

12.17

Operating System Concepts – 9^th Edition

18 of 50

Virtual File System Implementation

For example, Linux has four object types:

inode, file, superblock, dentry

VFS defines set of operations on the objects that must be implemented

Every object has a pointer to a function table

Function table has addresses of routines to implement that function on that object
For example:
• int open(. . .)—Open a file
• int close(. . .)—Close an already-open file
• ssize t read(. . .)—Read from a file
• ssize t write(. . .)—Write to a file
• int mmap(. . .)—Memory-map a file

12.18

Operating System Concepts – 9^th Edition

19 of 50

Directory Implementation

The selection of directory-allocation and directory-management algorithms significantly affects the efficiency, performance, and reliability of the file system.

Linear List

The simplest method of implementing a directory is to use a linear list of file names with pointers to the data blocks. This method is simple to program but time-consuming to execute.
To create a new file, we must first search the directory to be sure that no existing file has the same name. Then, we add a new entry at the end of the directory. To delete a file, we search the directory for the named file and then release the space allocated to it.
To reuse the directory entry, we can do one of several things. We can mark the entry as unused or we can attach it to a list of free directory entries. A third alternative is to copy the last entry in the directory into the freed location and to decrease the length of the directory. A linked list can also be used to decrease the time required to delete a file.

12.19

Operating System Concepts – 9^th Edition

20 of 50

Directory Implementation

The real disadvantage of a linear list of directory entries is that finding a file requires a linear search. Directory information is used frequently, and users will notice if access to it is slow.
In fact, many operating systems implement a software cache to store the most recently used directory information. A cache hit avoids the need to constantly reread the information from disk.
A sorted list allows a binary search and decreases the average search time. However, the requirement that the list be kept sorted may complicate creating and deleting files, since we may have to move substantial amounts of directory information to maintain a sorted directory.
A more sophisticated tree data structure, such as a balanced tree, might help here. An advantage of the sorted list is that a sorted directory listing can be produced without a separate sort step.

12.20

Operating System Concepts – 9^th Edition

21 of 50

Directory Implementation

Hash Table

Another data structure used for a file directory is a hash table. Here, a linear list stores the directory entries, but a hash data structure is also used. The hash table takes a value computed from the file name and returns a pointer to the file name in the linear list.
Therefore, it can greatly decrease the directory search time. Insertion and deletion are also fairly straightforward, although some provision must be made for collisions—situations in which two file names hash to the same location.
The major difficulties with a hash table are its generally fixed size and the dependence of the hash function on that size.
Alternatively, we can use a chained-overflow hash table. Each hash entry can be a linked list instead of an individual value, and we can resolve collisions by adding the new entry to the linked list.
Lookups may be somewhat slowed, because searching for a name might require stepping through a linked list of colliding table entries. Still, this method is likely to be much faster than a linear search through the entire directory.

12.21

Operating System Concepts – 9^th Edition

22 of 50

Allocation Methods

The direct-access nature of disks gives us flexibility in the implementation of files. In almost every case, many files are stored on the same disk. The main problem is how to allocate space to these files so that disk space is utilized effectively and files can be accessed quickly.
Three major methods of allocating disk space are in wide use: contiguous, linked, and indexed.
Each method has advantages and disadvantages. Although some systems support all three, it is more common for a system to use one method for all files within a file-system type.

12.22

Operating System Concepts – 9^th Edition

23 of 50

Contiguous Allocation

Contiguous allocation requires that each file occupy a set of contiguous blocks on the disk. Disk addresses define a linear ordering on the disk.
With this ordering, assuming that only one job is accessing the disk, accessing block b + 1 after block b normally requires no head movement. When head movement is needed, the head need only move from one track to the next. Thus, the number of disk seeks required for accessing contiguously allocated files is minimal, as is seek time when a seek is finally needed.
Contiguous allocation of a file is defined by the disk address and length (in block units) of the first block.

12.23

Operating System Concepts – 9^th Edition

24 of 50

Contiguous Allocation…

If the file is n blocks long and starts at location b, then it occupies blocks b, b + 1, b + 2, ..., b + n − 1. The directory entry for each file indicates the address of the starting block and the length of the area allocated for this file (Figure).
Accessing a file that has been allocated contiguously is easy. For sequential access, the file system remembers the disk address of the last block referenced and, when necessary, reads the next block. For direct access to block i of a file that starts at block b, we can immediately access block b + i.
Thus, both sequential and direct access can be supported by contiguous allocation.

12.24

Operating System Concepts – 9^th Edition

25 of 50

Contiguous Allocation of Disk Space

12.25

Operating System Concepts – 9^th Edition

26 of 50

Contiguous Allocation….

Contiguous allocation has some problems, however. One difficulty is finding space for a new file. The system chosen to manage free space determines how this task is accomplished.
As files are allocated and deleted, the free disk space is broken into little pieces. External fragmentation exists whenever free space is broken into chunks. It becomes a problem when the largest contiguous chunk is insufficient for a request; storage is fragmented into a number of holes, none of which is large enough to store the data.
Depending on the total amount of disk storage and the average file size, external fragmentation may be a minor or a major problem.

12.26

Operating System Concepts – 9^th Edition

27 of 50

Contiguous Allocation…

One strategy for preventing loss of significant amounts of disk space to external fragmentation is to copy an entire file system onto another disk. The original disk is then freed completely, creating one large contiguous free space. We then copy the files back onto the original disk by allocating contiguous space from this one large hole. This scheme effectively compacts all free space into one contiguous space, solving the fragmentation problem.
The cost of this compaction is time, however, and the cost can be particularly high for large hard disks. Compacting these disks may take hours and may be necessary on a weekly basis.

12.27

Operating System Concepts – 9^th Edition

28 of 50

Contiguous Allocation…

Another problem with contiguous allocation is determining how much space is needed for a file. When the file is created, the total amount of space it will need must be found and allocated. How does the creator (program or person) know the size of the file to be created? In general, however, the size of an output file may be difficult to estimate.
If we allocate too little space to a file, we may find that the file cannot be extended.
To minimize these drawbacks, some operating systems use a modified contiguous-allocation scheme. Here, a contiguous chunk of space is allocated initially. Then, if that amount proves not to be large enough, another chunk of contiguous space, known as an extent, is added.

12.28

Operating System Concepts – 9^th Edition

29 of 50

Linked Allocation

Linked allocation solves all problems of contiguous allocation. With linked allocation, each file is a linked list of disk blocks; the disk blocks may be scattered anywhere on the disk.
The directory contains a pointer to the first and last blocks of the file. For example, a file of five blocks might start at block 9 and continue at block 16, then block 1, then block 10, and finally block 25 (Figure). Each block contains a pointer to the next block.
A write to the file causes the free-space management system to find a free block, and this new block is written to and is linked to the end of the file. To read a file, we simply read blocks by following the pointers from block to block.

12.29

Operating System Concepts – 9^th Edition

30 of 50

Linked Allocation

12.30

Operating System Concepts – 9^th Edition

31 of 50

Indexed Allocation

Linked allocation solves the external-fragmentation and size-declaration problems of contiguous allocation. However, in the absence of a FAT, linked allocation cannot support efficient direct access, since the pointers to the blocks are scattered with the blocks themselves all over the disk and must be retrieved in order. Indexed allocation solves this problem by bringing all the pointers together into one location: the index block.
Each file has its own index block, which is an array of disk-block addresses. The ith entry in the index block points to the ith block of the file. The directory contains the address of the index block (Figure).

12.31

Operating System Concepts – 9^th Edition

32 of 50

Example of Indexed Allocation

12.32

Operating System Concepts – 9^th Edition

33 of 50

Indexed Allocation…

To find and read the ith block, we use the pointer in the ith index-block entry. This scheme is similar to the paging scheme.
Indexed allocation supports direct access, without suffering from external fragmentation, because any free block on the disk can satisfy a request for more space. Indexed allocation does suffer from wasted space, however.
The pointer overhead of the index block is generally greater than the pointer overhead of linked allocation.

12.33

Operating System Concepts – 9^th Edition

34 of 50

Free-Space Management

Since disk space is limited, we need to reuse the space from deleted files for new files, if possible. To keep track of free disk space, the system maintains a free-space list.
The free-space list records all free disk blocks-those not allocated to some file or directory.
To create a file, we search the free-space list for the required amount of space and allocate that space to the new file. This space is then removed from the free-space list. When a file is deleted, its disk space is added to the free-space list.
Bit Vector: Frequently, the free-space list is implemented as a bit vector or bit map. Each block is represented by 1 bit. If the block is free, the bit is 1; if the block is allocated, the bit is 0.
For example, consider a disk where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 are free and the rest of the blocks are allocated. The free-space bit map would be 001111001111110001100000011100000 ...
The main advantage of this approach is its relative simplicity and its efficiency in finding the first free block or n consecutive free blocks on the disk.

12.34

Operating System Concepts – 9^th Edition

35 of 50

Free-Space Management..

One technique for finding the first free block on a system that uses a bit-vector to allocate disk space is to sequentially check each word in the bit map to see whether that value is not 0, since a 0-valued word contains only 0 bits and represents a set of allocated blocks.
The first non-0 word is scanned for the first 1 bit, which is the location of the first free block. The calculation of the block number is
(number of bits per word) × (number of 0-value words) + offset of first 1 bit.
Again, we see hardware features driving software functionality. Unfortunately, bit vectors are inefficient unless the entire vector is kept in main memory.
Keeping it in main memory is possible for smaller disks but not necessarily for larger ones.

12.35

Operating System Concepts – 9^th Edition

36 of 50

Linked List(Free-Space Management)

Another approach to free-space management is to link together all the free disk blocks, keeping a pointer to the first free block in a special location on the disk and caching it in memory. This first block contains a pointer to the next free disk block, and so on.
Recall our earlier example, in which blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 were free and the rest of the blocks were allocated. In this situation, we would keep a pointer to block 2 as the first free block. Block 2 would contain a pointer to block 3, which would point to block 4, which would point to block 5, which would point to block 8, and so on.
This scheme is not efficient; to traverse the list, we must read each block, which requires substantial I/O time.

12.36

Operating System Concepts – 9^th Edition

37 of 50

Linked Free Space List on Disk

12.37

Operating System Concepts – 9^th Edition

38 of 50

Free-Space Management (Cont.)

Grouping

A modification of the free-list approach stores the addresses of n free blocks in the first free block. The first n−1 of these blocks are actually free. The last block contains the addresses of another n free blocks, and so on. The addresses of a large number of free blocks can now be found quickly, unlike the situation when the standard linked-list approach is used.

Counting

Another approach takes advantage of the fact that, generally, several contiguous blocks may be allocated or freed simultaneously, particularly when space is allocated with the contiguous-allocation algorithm or through clustering.

12.38

Operating System Concepts – 9^th Edition

39 of 50

Free-Space Management (Cont.)

Thus, rather than keeping a list of n free disk addresses, we can keep the address of the first free block and the number (n) of free contiguous blocks that follow the first block. Each entry in the free-space list then consists of a disk address and a count.

12.39

Operating System Concepts – 9^th Edition

40 of 50

Efficiency and Performance

Disks tend to represent a major bottleneck in system performance, since they are the slowest main computer component.

Efficiency

The efficient use of disk space depends heavily on the disk-allocation and directory algorithms in use. For instance, UNIX inodes are pre allocated on a volume. Even an empty disk has a percentage of its space lost to inodes.
However, by pre allocating the inodes and spreading them across the volume, we improve the file system’s performance.
This improved performance results from the UNIX allocation and free-space algorithms, which try to keep a file’s data blocks near that file’s inode block to reduce seek time.
The types of data normally kept in a file’s directory (or inode) entry also require consideration. Commonly, a “last write date” is recorded to supply information to the user and to determine whether the file needs to be backed up.

12.40

Operating System Concepts – 9^th Edition

41 of 50

Efficiency and Performance

Some systems also keep a “last access date,” so that a user can determine when the file was last read.
The result of keeping this information is that, whenever the file is read, a field in the directory structure must be written to. That means the block must be read into memory, a section changed, and the block written back out to disk, because operations on disks occur only in block (or cluster) chunks.
So any time a file is opened for reading, its directory entry must be read and written as well. This requirement can be inefficient for frequently accessed files, so we must weigh its benefit against its performance cost when designing a file system.
Generally, every data item associated with a file needs to be considered for its effect on efficiency and performance.

12.41

Operating System Concepts – 9^th Edition

42 of 50

Performance

Even after the basic file-system algorithms have been selected, we can still improve performance in several ways.
Once a seek is performed, the track is read into the disk cache starting at the sector under the disk head (reducing latency time). The disk controller then transfers any sector requests to the operating system.
Once blocks make it from the disk controller into main memory, the operating system may cache the blocks there.
Some systems maintain a separate section of main memory for a buffer cache, where blocks are kept under the assumption that they will be used again shortly. Other systems cache file data using a page cache.
The page cache uses virtual memory techniques to cache file data as pages rather than as file-system-oriented blocks.

12.42

Operating System Concepts – 9^th Edition

43 of 50

Performance

Caching file data using virtual addresses is far more efficient than caching through physical disk blocks, as accesses interface with virtual memory rather than the file system.
Several systems—including Solaris, Linux, and Windows —use page caching to cache both process pages and file data. This is known as unified virtual memory.
Some versions of UNIX and Linux provide a unified buffer cache. To illustrate the benefits of the unified buffer cache, consider the two alternatives for opening and accessing a file. One approach is to use memory mapping ; the second is to use the standard system calls read() and write().
Here, the read() and write() system calls go through the buffer cache. The memory-mapping call, however, requires using two caches—the page cache and the buffer cache.
A memory mapping proceeds by reading in disk blocks from the file system and storing them in the buffer cache.

12.43

Operating System Concepts – 9^th Edition

44 of 50

Performance

Figure 12.11 I/O without a unified buffer cache.

12.44

Operating System Concepts – 9^th Edition

45 of 50

Performance

Because the virtual memory system does not interface with the buffer cache, the contents of the file in the buffer cache must be copied into the page cache. This situation, known as double caching, requires caching file-system data twice.
Not only does it waste memory but it also wastes significant CPU and I/O cycles due to the extra data movement within system memory.
In addition, inconsistencies between the two caches can result in corrupt files.
Another issue that can affect the performance of I/O is whether writes to the file system occur synchronously or asynchronously.
Synchronous writes occur in the order in which the disk subsystem receives them, and the writes are not buffered. Thus, the calling routine must wait for the data to reach the disk drive before it can proceed.

12.45

Operating System Concepts – 9^th Edition

46 of 50

Performance

In an asynchronous write, the data are stored in the cache, and control returns to the caller. Most writes are asynchronous.
However, metadata writes, among others, can be synchronous. Operating systems frequently include a flag in the open system call to allow a process to request that writes be performed synchronously.

12.46

Operating System Concepts – 9^th Edition

47 of 50

Recovery

Files and directories are kept both in main memory and on disk, and care must be taken to ensure that a system failure does not result in loss of data or in data inconsistency. We also consider how a system can recover from such a failure.
A system crash can cause inconsistencies among on-disk file-system data structures, such as directory structures, free-block pointers, and free FCB pointers. Many file systems apply changes to these structures in place.
In addition to crashes, bugs in file-system implementation, disk controllers, and even user applications can corrupt a file system.

Consistency Checking

Whatever the cause of corruption, a file system must first detect the problems and then correct them.
For detection, a scan of all the metadata on each file system can confirm or deny the consistency of the system. Unfortunately, this scan can take minutes or hours and should occur every time the system boots.

12.47

Operating System Concepts – 9^th Edition

48 of 50

Recovery

The allocation and free-space management algorithms dictate what types of problems the checker can find and how successful it will be in fixing them.

Log-Structured File Systems

Computer scientists often find that algorithms and technologies originally used in one area are equally useful in other areas. Such is the case with the database log-based recovery algorithms.
These logging algorithms have been applied successfully to the problem of consistency checking. The resulting implementations are known as log-based transaction-oriented (or journaling) file systems.

12.48

Operating System Concepts – 9^th Edition

49 of 50

Recovery

Backup and Restore

Magnetic disks sometimes fail, and care must be taken to ensure that the data lost in such a failure are not lost forever. To this end, system programs can be used to back up data from disk to another storage device, such as a magnetic tape or other hard disk.
Recovery from the loss of an individual file, or of an entire disk, may then be a matter of restoring the data from backup.
A typical backup schedule may then be as follows:
Day 1. Copy to a backup medium all files from the disk. This is called a full backup.
Day 2. Copy to another medium all files changed since day 1. This is an incremental backup.
Day 3. Copy to another medium all files changed since day 2.

...

Day N. Copy to another medium all files changed since day N−1. Then go back to day 1.

12.49

Operating System Concepts – 9^th Edition

50 of 50

Recovery

Using this method, we can restore an entire disk by starting restores with the full backup and continuing through each of the incremental backups.
Of course, the larger the value of N, the greater the number of media that must be read for a complete restore. An added advantage of this backup cycle is that we can restore any file accidentally deleted during the cycle by retrieving the deleted file from the backup of the previous day.

12.50

Operating System Concepts – 9^th Edition