1 of 35

CSE 451

Operating Systems

L21 – Spinning disks & File system abstraction

Slides by: Tom Anderson

Rohan Kadekodi

2 of 35

Magnetic Disk

3 of 35

Magnetic Disk Anatomy

4 of 35

Disk Tracks

  • ~ 1 micron wide
    • Wavelength of light is ~ 0.5 micron
    • Resolution of human eye: 50 microns
    • 100K tracks on a typical 2.5” disk
  • Separated by unused guard regions
    • Reduces likelihood neighboring tracks are corrupted during writes
    • Bit level corruption still happens with non-zero chance
  • Track length varies across disk
    • Outside: More sectors per track, higher bandwidth
    • Disk is organized into regions of tracks with same # of sectors/track
    • Only outer half of radius is used
      • Most of the disk area in the outer regions of the disk

5 of 35

Disk Performance

Disk Latency =

Seek Time + Rotation Time + Transfer Time

Seek Time: time to move disk arm over track (1-20ms)

Fine-grained position adjustment necessary for head to “settle”

Head switch time ~ track switch time (on modern disks)

Rotation Time: time to wait for disk to rotate under disk head

Disk rotation: 4 – 15ms (depending on price of disk) On average, only need to wait half a rotation

Transfer Time: time to transfer data onto/off of disk

Disk head transfer rate: 100-250MB/s (5-10 usec/sector)

Host transfer rate dependent on I/O connector (USB, SATA, …)

6 of 35

Western Digital Ultrastar DC HC690

Capacity

32 TB, 7 platters

Spin Speed

7200 RPM

Sustained Transfer Rate

270 MB/s (read), 260 MB/s (write)

Interface Transfer Rate

1.2 GB/s

Seek time (avg)

8 ms (read), 8.6 ms (write)

Rotational latency (avg)

4.16 ms

Cache

512 MB DRAM

Idle/Operating Power

5.8W/9.7W

Bit Error Rate (read)

10^-15

7 of 35

File Systems

8 of 35

File System Abstraction

  • File system
    • Persistent, named data
    • Hierarchical organization (directories, subdirectories)
    • Access control on data
  • File: named collection of data
    • Linear sequence of bytes (or a set of sequences)
    • Read/write or memory mapped
  • Crash and storage error tolerance
    • Operating system crashes (and disk errors) leave file system in a valid state
  • On top of block-oriented storage devices (flash, magnetic disk)
    • Achieve close to the hardware performance limit in the average case despite complex performance/reliability device characteristics

9 of 35

File System Abstraction

  • Directory
    • Group of named files or subdirectories
    • Mapping from file name to file metadata location
  • Path
    • String that uniquely identifies file or directory
    • Ex: /cse/www/education/courses/cse451/26sp/index.html
  • Links
    • Hard link: one file name points to the same file metadata as another file
    • Soft link: one file name points to another file name
  • Mount
    • Mapping from name in one file system to root of another

10 of 35

UNIX File System API

  • create, link, unlink, createdir, rmdir
    • Create file, link to file, remove link
    • Create directory, remove directory
  • open, close, read, write, seek
    • Open/close a file for reading/writing
    • Seek resets current position
  • fsync
    • File modifications can be cached, written back to disk in any order
    • fsync forces modifications to disk (like a memory barrier)
    • Warning: programmers who use fsync very often use it incorrectly

11 of 35

Named Data in a File System

File number = location of inode

inode contains table to find storage for any offset

12 of 35

inode -> disk block

  • Index structure
    • How do we locate the blocks of a file? (why not use multilevel page tables?)
  • Index granularity
    • What block size do we use? (4KB vs. larger?)
  • Free space
    • How do we find unused blocks on disk?
  • Locality
    • Do we preserve spatial locality?
  • Reliability
    • What if machine crashes in middle of a file system operation?

13 of 35

File System Workload

  • File sizes
    • Are most files small or large?
    • Which accounts for more total storage: small or large files?

14 of 35

File System Workload

  • File sizes
    • Are most files small or large?
      • SMALL
    • Which accounts for more total storage: small or large files?
      • LARGE

15 of 35

File System Workload

  • File access
    • Are most accesses to small or large files?
    • Which accounts for more total I/O bytes: small or large files?

16 of 35

File System Workload

  • File access
    • Are most accesses to small or large files?
      • SMALL
    • Which accounts for more total I/O bytes: small or large files?
      • LARGE

17 of 35

File System Workload

  • Most files are read/written sequentially
  • Some files are read/written randomly
    • Ex: database files, swap files
  • Some files have a pre-defined size at creation
  • Most files start empty and grow over time
    • Ex: compiler output, system logs

18 of 35

File System Design Constraints

  • For small files:
    • Small blocks for storage efficiency
    • Concurrent ops more efficient than sequential
    • Files used together should be stored together
  • For large files:
    • Storage efficient (large blocks)
    • Contiguous allocation for sequential access
    • Efficient lookup for random access
  • May not know at file creation
    • Whether file will become small or large
    • Whether file is persistent or temporary
    • Whether file will be used sequentially or randomly

19 of 35

File System Design Options

FAT

FFS

NTFS

Index structure

Linked list

Tree (fixed, assym)

Tree (dynamic)

granularity

block

block

extent

free space allocation

FAT array

Bitmap (fixed location)

Bitmap (file)

Locality

defragmentation

Block groups

+ reserve space

Extents Best fit defrag

20 of 35

Microsoft File Allocation Table (FAT)

  • Linked list index structure
    • Simple, easy to implement
    • Still widely used (e.g., embedded devices, SD cards)
  • File table:
    • Linear map of all blocks on disk
    • Each file a linked list of blocks

21 of 35

FAT

22 of 35

FAT

  • Pros:
    • Easy to find free block
    • Easy to append to a file
    • Easy to delete a file
  • Cons:
    • Small file access is slow
    • Random access is very slow
    • Fragmentation
      • File blocks for a given file may be scattered
      • Files in the same directory may be scattered
      • Problem becomes worse as disk fills

23 of 35

Berkeley UNIX FFS (Fast File System)

  • inode table
    • Analogous to FAT table
  • inode
    • Metadata
      • File owner, access permissions, access times, …
    • Set of 12 data pointers
    • With 4KB blocks => max size of 48KB files

24 of 35

FFS inode

  • Metadata
    • File owner, access permissions, access times, …
  • Set of 12 data pointers
    • With 4KB blocks => max size of 48KB files
  • Indirect block pointer
    • pointer to disk block of data pointers
  • Indirect block: 1K data blocks => 4MB (+48KB)

25 of 35

FFS inode

  • Metadata
    • File owner, access permissions, access times, …
  • Set of 12 data pointers
    • With 4KB blocks => max size of 48KB
  • Indirect block pointer
    • pointer to disk block of data pointers
    • 4KB block size => 1K data blocks => 4MB
  • Doubly indirect block pointer
    • Doubly indirect block => 1K indirect blocks
    • 4GB (+ 4MB + 48KB)

26 of 35

FFS inode

  • Metadata
    • File owner, access permissions, access times, …
  • Set of 12 data pointers
    • With 4KB blocks => max size of 48KB
  • Indirect block pointer
    • pointer to disk block of data pointers
    • 4KB block size => 1K data blocks => 4MB
  • Doubly indirect block pointer
    • Doubly indirect block => 1K indirect blocks
    • 4GB (+ 4MB + 48KB)
  • Triply indirect block pointer
    • Triply indirect block => 1K doubly indirect blocks
    • 4TB (+ 4GB + 4MB + 48KB)

27 of 35

28 of 35

FFS Asymmetric Tree

  • Small files: shallow tree
    • Efficient storage for small files
  • Large files: deep tree
    • Efficient lookup for random access in large files
  • Sparse files: only fill pointers if needed

29 of 35

FFS Locality

  • Block group allocation
    • Block group is a set of nearby cylinders
    • Files in same directory located in same group
    • Subdirectories located in different block groups
  • inode table spread throughout disk
    • inodes, bitmap near file blocks
  • First fit allocation
    • Small files fragmented, large files contiguous

30 of 35

31 of 35

FFS First Fit Block Allocation

32 of 35

FFS First Fit Block Allocation

33 of 35

FFS First Fit Block Allocation

34 of 35

FFS

  • Pros
    • Efficient storage for both small and large files
    • Locality for both small and large files
    • Locality for metadata and data
  • Cons
    • Inefficient for tiny files (a 1 byte file requires both an inode and a data block)
    • Inefficient encoding when file is mostly contiguous on disk
    • Need to reserve 10-20% of free space to prevent fragmentation

35 of 35

NTFS

  • Master File Table
    • Flexible 1KB storage for metadata and data
  • Extents
    • Block pointers cover runs of blocks
    • Similar approach in linux ext4
    • File create can provide hint as to size of file
  • Journalling for reliability
    • Next lecture