1 of 46

CSE 451

Operating Systems

L3 - The Kernel Abstraction - 2

Slides by: Tom Anderson

Baris Kasikci

2 of 46

Question

  • What (hardware, software) do we need to be able to run arbitrary untrustworthy application code at the full speed of the CPU?

Memory management – limit memory access

Mode switch in CPU – privilege mode

In user mode, not change memory limit

3 of 46

Dual-Mode Operation

  • Kernel mode
    • Execution with the full privileges of the hardware
    • Read/write to any memory location, access any I/O device, read/write any disk

sector, send/read any packet

  • User mode
    • Limited privileges
    • Only those granted by the operating system kernel
  • On the x86, mode stored in EFLAGS register (different on ARM, RISC-V)

4 of 46

Privilege levels in x86

  • x86 actually has four modes not two
    • ring 0 (kernel mode)
    • rings 1 and 2 (less than full privilege if OS wants to use it)
    • ring 3 (user mode)
  • Actually eight modes
    • For virtual machine support
    • Will cover later

5 of 46

A Model of a CPU

6 of 46

Dual-Mode Operation

7 of 46

Hardware Support for Dual-Mode Operation

  • Privileged instructions
    • Available to kernel
    • Not available to user code
  • Limits on memory accesses
    • To prevent user code from overwriting the kernel
  • Periodic timer interrupt
    • To regain control from a user program in a loop
  • Safe way to switch from user mode to kernel mode, and vice versa

8 of 46

Privileged instructions

  • Examples?
  • What should happen if a user program attempts to execute a privileged instruction?

Change the page table for an application

Change the mode (kernel mode -> restriction on where I am running the code to handle the switch)

Fault/exception

9 of 46

Simple Memory Protection

10 of 46

What Happens When Isolation Fails?

11 of 46

Physical Memory

12 of 46

Virtual Addresses

  • Translation done in hardware, using a table
    • On every instruction!
  • Table set up by operating system kernel
    • Eg, a page table points to fixed

size page frames

    • Or a multi-level page table to handle very large or sparse virtual address space

13 of 46

Virtual Address Example

int staticVar = 0; // a static variable main() {

int localVar = 0; // a procedure local variable

staticVar += 1;

localVar += 1;

sleep(10); // sleep for x seconds

printf ("static address: %x, value: %d\n", &staticVar, staticVar); printf (”local address: %x, value: %d\n", &localVar, localVar);

}

What if we run two instances of this program at the same time?

14 of 46

Virtual Addresses

  • Translation done in hardware, using a table
    • On every instruction!
  • Table set up by operating system kernel

15 of 46

Virtual Address != Physical Address

  • The same virtual address in two different processes can refer to different physical addresses. Why?

  • The same virtual address in two different processes can refer to the same physical address. Why?

  • Different virtual addresses can refer to the same physical address. Why?

16 of 46

Hardware Timer

  • Hardware device that periodically interrupts the processor
    • Same mechanism as I/O device interrupt
    • Returns control to kernel handler
    • Handler checks if need to stop user process
  • Interrupt frequency set by the kernel
    • Not by user code!
  • Interrupts can be temporarily deferred by kernel
    • Not by user code!
    • Crucial for implementing mutual exclusion

17 of 46

User->Kernel Mode Switch

  • From user mode to kernel mode
    • Interrupts
      • Triggered by timer and I/O devices
      • Interrupts can be delivered in user or kernel mode (when enabled)
    • Exceptions
      • Triggered by unexpected or malicious application behavior
    • System calls (protected procedure call)
      • Request for kernel to do some operation on behalf of app
  • Trap: any user->kernel mode switch (from “trap door”)
    • Limited # of very carefully coded entry points

18 of 46

Some Exceptions

19 of 46

Some More Exceptions

20 of 46

Trap and Emulate

  • OS technique for providing functionality beyond what hardware provides by itself
    • Hardware traps to the kernel
    • Kernel implements missing functionality
    • Returns to user code
  • Examples:
    • Handle a divide by zero error
    • Emulate floating point unit on hardware without one
    • Paged virtual memory
    • Copy on write virtual memory
    • Memory mapped files

21 of 46

System Calls

  • Functions provided by the kernel to untrusted application code
    • Ex: file open, file read, file write, file close, pipe
  • Limited set of entry points into the kernel
    • User code can’t just call any kernel procedure
    • Protected procedure calls
  • API is system-specific (UNIX != Windows)
    • Windows has ~ 2000 system calls
    • Linux has ~ 800

22 of 46

UNIX systems calls (1972)

  • Mount, unmount filesystem
  • File open/create
  • Seek to a file offset
  • Read/write file
  • Close, delete file
  • Change file owner/mode
  • Create/delete directory
  • Add link to file
  • Ioctl for I/O device operations
  • Process fork
  • Process exec
  • Exit process; wait for exit
  • Create pipe between processes
  • Send signal to another process
  • Set/mask signal handler
  • Extend heap memory region

23 of 46

24 of 46

Kernel System Call Handler

  • Locate arguments
    • In registers or on user stack
    • Translate user addresses into kernel addresses (if necessary)
  • Copy arguments
    • From user memory into kernel memory
    • Protect kernel from malicious code!
  • Validate arguments
    • Protect kernel from errors in user code
  • Copy results back into user memory
    • Translate kernel addresses into user addresses (if necessary)

25 of 46

How Many User<->Kernel Transitions?

  • For a static web server to receive a request and reply with the file data?

26 of 46

User/Kernel Virtual Addresses

27 of 46

Why dup?

A shell is a user application that runs other programs

% grep “To be or not” Shakespeare.txt

shell creates process “grep”, with arguments; outputs to stdout

% grep “To be or not” Shakespeare.txt > logfile

shell creates process “grep”, with arguments; outputs to logfile

fd = open(“logfile”)

close(stdout)

dup(fd) -> grep uses logfile as its stdout

% grep “To be or not” Shakespeare.txt | wc

Shell creates two processes, “grep” and “wc”

A pipe provides one-way communication between two processes

Shell sets stdout of grep to be one end of pipe; stdin of wc to be the other

28 of 46

Kernel->User Mode Switch

  • From kernel mode to user mode
    • New process/new thread start
      • Jump to first instruction in program/thread
    • Return from interrupt, exception, system call
      • Resume suspended execution
    • Process/thread context switch
      • Resume some other process
    • User-level upcall (UNIX signal)
      • Asynchronous notification to user program

29 of 46

Restoring User State

  • We need to be able to interrupt and transparently resume the execution of a user program for several reasons:
    • I/O device signals I/O completion
    • Periodic hardware timer to check if app is hung
    • Multiplexing multiple apps on a single CPU (timer interrupts)
    • Recoverable exceptions (ex: to extend stack transparently to application)
    • App unaware it has been interrupted or took an exception!

30 of 46

How do we take interrupts/exceptions safely?

  • Interrupt vector
    • Limited number of entry points into kernel
    • User code can’t jump to an arbitrary location (eg, past privilege checks)
  • Atomic transfer of control
    • Hardware saves/restores:
      • Program counter
      • Stack pointer
      • Memory protection
      • Kernel/user mode
    • Interrupt handler saves/restores additional registers
  • Transparent restartable execution
    • User program does not know interrupt/exception occurred

31 of 46

Interrupt Vector on x86

32 of 46

Kernel (Interrupt) Stack

  • Per-processor, located in kernel (not user) memory
    • Runs the trap/interrupt/exception/syscall handler
    • Every process needs both a kernel interrupt stack and user stack
    • Multiple threads: multiple kernel interrupt stacks
  • Why can’t the interrupt handler run on the stack of the

interrupted user process?

33 of 46

Interrupt Stack

34 of 46

Interrupt Masking

  • Interrupt handler runs with interrupts off
    • Re-enabled when interrupt completes
  • OS kernel can also turn interrupts off
    • Eg., when determining the next process/thread to run
    • On x86
      • CLI: disable interrrupts
      • STI: enable interrupts
      • Only applies to the current CPU (on a multicore)
  • We’ll need this to implement synchronization in chapter 5

35 of 46

Interrupt Handlers

  • Non-blocking, run to completion
    • Minimum necessary to allow device to take next interrupt
    • Any waiting must be limited duration
    • Wake up other threads to do any real work
      • Linux: semaphore
  • Rest of device driver runs as a kernel thread

36 of 46

Case Study: x86 Interrupt (Hardware Support)

  • Hardware saves current stack pointer
  • Saves current program counter
  • Saves current processor status word (condition codes)
  • Switches to kernel stack
  • Puts SP, PC, PSW on stack
  • Switches to kernel mode
  • Vectors through interrupt table
  • Interrupt handler saves registers it might clobber

37 of 46

During Interrupt

38 of 46

Aft er Interrupt

39 of 46

xk

  • vectors.S – why do some vectors push 2 arguments and some push 1?
  • trapasm.S – looks like callee save, but who saves the PC and SP?
  • trap.h – on stack so items at bottom of struct pushed first
  • trap.c – called once trapframe is set up
  • trapasm.S:trapret -- where call to trap returns

40 of 46

At end of handler

  • Handler restores saved registers
  • Atomically return to interrupted process/thread
    • Restore program counter
    • Restore program stack
    • Restore processor status word/condition codes
    • Switch to user mode

41 of 46

Question

  • Suppose the OS over-writes a value in the trapframe. What happens when the handler returns?

  • Why might the OS want to do this?

42 of 46

Question

  • The trapframe is stored on the kernel stack; where is it stored after a context switch to a different process?

43 of 46

Upcall: User-level event delivery

  • Notify user process of some event that needs to be handled right away
    • Time expiration
      • Real-time user interface
      • Time-slice for user-level thread manager
    • Interrupt delivery for VM player
    • Asynchronous I/O completion (async/await)
  • AKA UNIX signal

44 of 46

Upcalls vs Interrupts

  • Signal handlers = interrupt vector
  • Signal stack = interrupt stack
  • Automatic save/restore registers = transparent resume
  • Signal masking: signals disabled while in signal handler

45 of 46

Upcall: Before

46 of 46

Upcall: During