1 of 50

Threads

CS-446/646

C. Papachristos

Robotic Workers (RoboWork) Lab

University of Nevada, Reno

2 of 50

Remember: Processes

A Process is a Heavyweight Abstraction

Includes many things

    • A Virtual Address Space (defining all the Code and Data Memory Pages)
    • OS Resources (e.g. open Files) and Accounting information
    • Process State (Program Counter (PC), Stack Pointer (SP), Registers, etc.)

Creating a new Process is costly

    • A lot of data structures that must be allocated and initialized
      • Remember: struct proc_t (in Solaris)

Communicating between Processes is also costly

    • Methods of communication have to go through the OS
      • Overhead of making System Calls
        • Copying Message data
        • Setting up Shared Memory

CS446/646 C. Papachristos

3 of 50

Remember: Processes

Concurrent Programs

Recall simple Web server example

    • Using fork() to make copies of itself that each of them would handle some request
    • Multiple requests can be handled “simultaneously”

To execute these Programs we need to:

  • Create several Processes that execute “concurrently”
  • Have each of them map to the same Address Space to share data – default fork() (/clone()) behavior
    • They can all be thought of as a group, part of the same “computational unit”
      • Note: Also, what if one in this group of Processes changes its Virtual Address Space mappings?
  • Have the OS Schedule these Processes to run simultaneously (logically or physically)

This is inefficient:

    • Space: PCB, Page Tables (more in Virtual Memory Lecture …), etc.
    • Time: Create data structures, fork() and copy Virtual Address Space, etc.

CS446/646 C. Papachristos

Really,

4 of 50

Remember: Processes

Rethinking Processes

  • What would be similar in such “cooperating” Processes
    • All should in principle share the same Code and Data (Address Space)
    • All should in principle share the same Privileges
    • All should in principle share the same OS Resources (Files, Sockets, etc.)

  • What would be different in “cooperating” Processes
    • Each should have its own State of Execution: PC, SP, and Registers

Core Idea: Think separately about the concept of a Process, and a State of Execution

  • Process :

Address Space, Privileges, Resources, etc.

CS446/646 C. Papachristos

  • State of Execution (a.k.a “Thread of Execution”) :

PC, SP, Registers

5 of 50

Threads

Threads & Processes

Modern OSs have implementations of the concepts of Processes and Threads

Thread

  • Defines a sequential execution stream within a Process (PC, SP, Registers)
    • Threads are separate streams of executions that share a (Virtual) Address Space
    • This allows one Process to have multiple points of execution (will exist multiple associated States of execution)
      • can potentially use multiple CPUs

Process

  • Defines a (Virtual) Address Space and general Process Attributes

A Thread is bound to a single Process

    • A Process can however have multiple Threads

A Thread now becomes the unit of Scheduling

    • Processes are now “containers” under which Threads execute
    • Threads become the actual “dynamic” entities

CS446/646 C. Papachristos

6 of 50

Threads

Thread Control Block (TCB)

The required data structure (implementation specific)

    • Lightweight & Fast

  • Thread Identifier: Unique id (tid) assigned to every new Thread
  • Execution State of Thread (e.g. Running, Ready, Waiting, Started, Terminated)
  • Stack Pointer (ESP & EBP in x86): Points to Thread’s Stack (in Process’ Address Space)
  • Program Counter (EIP on x86): Points to Thread’s next Instruction
  • Other Register values of the Thread
  • Pointer to Process Control Block of the Process the Thread lives in

    • e.g. Pintos Thread struct �

CS446/646 C. Papachristos

struct thread {

tid_t tid; /* Thread identifier. */

enum thread_status status; /* Thread state. */

char name[16]; /* Name (for debugging purposes). */

uint8_t *stack; /* Saved stack pointer. */

int priority; /* Priority. */

struct list_elem allelem; /* List element for all threads list. */

struct list_elem elem; /* List element. */

unsigned magic; /* Detects stack overflow. */

};

7 of 50

Threads

Threads in a Process

  • Threads in the same Process share the same Global Memory
    • Code & Data and Heap Segments
      • i.e. share Code, Data, but also Kernel-managed OS Resources (Files, etc.)

CS446/646 C. Papachristos

8 of 50

Threads

Threads in a Process

CS446/646 C. Papachristos

9 of 50

Threads

Threads & Processes

Why?

  • Concurrency does not always require creation of an entirely new Process
  • Easier to support Multithreaded applications

Multithreading can be very useful

  • Can unlock “Parallelism”, i.e. the potential to allocate different Threads to multiple cores/CPUs
    • Note: Not an easy task, adds in numerous optimization considerations, e.g. Memory Pollution
  • Improving a Program’s structure
  • Handling simultaneous Events (e.g. web requests)
  • Allowing a Program to overlap I/O and computation

So, Multithreading is even useful on a single-core Processor

    • Although today we can have multicore power-efficient microprocessors in almost everything

  • Required software engineering skillset:Synchronization (next Lecture)

CS446/646 C. Papachristos

10 of 50

Threads

Multithreaded Concurrent Server

Using fork() to create a new Process to handle requests is an overkill:

  • Remember:

CS446/646 C. Papachristos

for(;;) {

int client_sock = accept(server_sock, addr, addrlen);

if ((child_pid = fork()) == 0) { // Child Process

close(server_sock);

handle_request( client_sock );

} else { // Parent Process

// back to for(;;)

}

}

void handle_request(int sock) {

// Process client request

close(sock);

}

11 of 50

Threads

Multithreaded Concurrent Server

Instead, create a new Thread for each request:

  • Example:

CS446/646 C. Papachristos

void run_web_server() {

for(;;) {

int client_sock = accept(server_sock, addr, addrlen);

thread_create_interface(handle_request, client_sock);

}

}

void handle_request(int sock) {

// Process request

close(sock);

}

?

Note: This will now execute in a new Thread,�i.e. will have its own State of execution and can be separately Scheduled

12 of 50

Threads

Thread API (Unix)

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,

void *(*start_routine) (void *), void *arg);

  • Create a new Thread in the calling Process, run start_routine with arguments arg

  • Can be customized via attr

int pthread_join(pthread_t thread, void **retval);

  • The calling Thread waits for the target Thread specified by thread to terminate

  • If that Thread has already terminated, then pthread_join() returns immediately.
  • The target Thread specified by thread must be Joinable (i.e. not created with PTHREAD_CREATE_DETACHED attribute, or pthread_detach()-ed later).
  • If the supplied retval is not NULL, then it copies the exit status of the target Thread (i.e. the value that it supplied to pthread_exit())

CS446/646 C. Papachristos

13 of 50

Threads

Thread API (Unix)

void pthread_exit(void *retval);

  • Terminates the calling Thread and returns a value via retval that (if the calling Thread is Joinable) is made available to another Thread in the same Process that calls pthread_join().

  • Performing a return from the start function of any Thread other than the Main Thread results in an implicit call to pthread_exit(), using the function's return value as the Thread’s exit status.

  • (!) To allow other Threads to continue execution, the Main Thread should terminate by calling pthread_exit() rather than the exit() System Call.

  • (!) The value pointed to by retval should not be located on the calling Thread’s Stack, since the contents of that Stack are Undefined after the Thread terminates.

CS446/646 C. Papachristos

14 of 50

Threads

Thread API (Unix) – Example 1

CS446/646 C. Papachristos

int main() {

pthread_t t1, t2;

pthread_create(&t1, NULL, thread_fn, (void*)1);

pthread_create(&t2, NULL, thread_fn, (void*)2);

pthread_join(t1, NULL);

pthread_join(t2, NULL);

return 0;

}

void* thread_fn(void *arg) {

int tid = (int)arg;

printf("thread %d\n", tid);

printf("is now\n");

usleep(1);

printf("running!\n");

pthread_exit(NULL);

}

$ gcc -o threads threads.c -pthread

$ ./threads

thread 1

is now

thread 2

is now

running!

running!

Note: stdout is by default Line-Buffered

15 of 50

Threads

Thread API (Unix) – Example 1 (Better)

CS446/646 C. Papachristos

int main() {

pthread_t t1, t2;

thread_data t1_d, t2_d;

t1_d.id = 1;

t2_d.id = 2;

pthread_create(&t1, NULL, thread_fn, (void*)&t1_d);

pthread_create(&t2, NULL, thread_fn, (void*)&t2_d);

pthread_join(t1, NULL);

pthread_join(t2, NULL);

return 0;

}

void* thread_fn(void *arg) {

thread_data t_d = *(thread_data *)arg;

printf("thread %d is now running!\n", td.id);

pthread_exit(NULL);

}

$ gcc -o threads threads.c -pthread

$ ./threads

thread 1 is now running!

thread 2 is now running!

Note: stdout is by default Line-Buffered

16 of 50

Threads

Thread Implementation

A general prototype:

… thread_create( … , start_routine, args);

  • Allocate Thread Control Block (TCB)
  • Allocate Thread Stack
  • Build Stack Frame for Base of Thread Stack
  • Put start_routine’s args on Thread Stack
  • Put Thread on Ready Queue for Scheduling

  • But where should Threads “live”?

CS446/646 C. Papachristos

17 of 50

Threads

Thread Implementation

  • Multithreading Models

User-Level Threads:

  • Thread management done by User-Level Thread Library, Kernel knows nothing

Kernel-Level Threads:

  • Threads directly supported by the Kernel
    • Virtually all modern OS support Kernel Threads

CS446/646 C. Papachristos

18 of 50

Threads

Kernel-Level Threads

  • All Thread operations are implemented in the Kernel by an OS-provided API

  • The OS Schedules all the Threads in the system

  • Threads initially called Lightweight Processes
    • Windows: Threads
    • Solaris: Lightweight Processes (LWP)
    • POSIX Threads (pthreads)
      • Extra Read: Thread Contention Scope attribute: PTHREAD_SCOPE_SYSTEM

CS446/646 C. Papachristos

The Thread competes for resources with all other Threads in all Processes on the system that are in the same Scheduling Allocation Domain (a group of one or more Processors).

19 of 50

Threads

Kernel-Level Thread Limitations

  • Every Thread operation must go through the Kernel (System Call Interface)
    • Kernel has to do Creation, Exiting, Joining, Synchronizing, Scheduling/Switching
      • On typical laptop: syscall might take 100 cycles, a function call only takes 5 cycles
      • Result: Threads 10x-30x slower when implemented in Kernel

  • One-size fits all Thread implementation
    • Kernel Threads must please every user of the OS
    • User may have to pay for certain features (priority, etc.) that aren’t necessary

  • General heavy-weight Memory requirements
    • E.g. requires its own fixed-size Thread Stack within the Kernel
      • Remember: In Linux, every Thread has its own User Stack and its own Kernel Stack
    • Other required data structures designed for heavier-weight Processes

CS446/646 C. Papachristos

20 of 50

Threads

User-Level Threads

  • Alternative: Implement as a User-Level Thread Library (a.k.a. Green Threads)

  • One Kernel Thread per-Process
    • Creating, Exiting, Joining, etc. are just functions of a User-Level Thread Library
    • Library does Thread Context Switching

  • User-Level Threads are small and fast
    • Java: Thread
    • POSIX Threads (pthreads)
      • Note: NOT User-Level Threads, but see: Thread Contention Scope attribute: PTHREAD_SCOPE_PROCESS

CS446/646 C. Papachristos

The Thread competes for resources with all other Threads in the same Process that were also created with the PTHREAD_SCOPE_PROCESS contention scope, and are scheduled relative to other�Threads in the Process according to their Scheduling policy and priority.

21 of 50

Threads

User-Level Thread Limitations

  • Can’t take advantage of multiple CPUs or cores

  • User-Level Threads are “invisible” to the OS
    • Not directly integrated with the OS

  • As a result, the OS can make poor decisions
    • Deciding to Schedule a Process with Idle User-Level Threads
    • A “Blocking System Call (e.g. Disk read) blocks all User-Level Threads of that Process, even if the Process has other User-Level Threads that can be executed
    • Unscheduling a Process with a User-Level Thread that is holding a Lock (more on Locks later…)

How to solve this?

  • Communication between the Kernel and the User-Level Thread Manager (e.g. Windows 8)
    • See: Scheduler Activation

CS446/646 C. Papachristos

22 of 50

Threads

Implementing User-Level Threads (upcoming Homework Assignment)

(All performed/accounted-for at User-Space :)

  • Allocate a new Thread Stack space for each thread_create

  • Keep a Queue of Runnable Threads (/Tasks)

  • Configure a periodic Timer-based Signale.g. System Call for Interval Timer setitimer()
    • Switch to another Thread on Timer Signal (i.e. SIGALRM)

  • Leverage User-Space Task Context Switching mechanism�e.g. in Linux getcontext(), setcontext(), makecontext(), swapcontext()

  • Account for Synchronization (?)

  • Replace “Blocking System Calls (read/write/etc.) with “Nonblocking” versions
    • If operation WOULD_BLOCK, switch and run different Thread

CS446/646 C. Papachristos

23 of 50

Threads

Implementing User-Level Threads (upcoming Homework Assignment)

(All performed/accounted-for at User-Space :)

  • The Thread Scheduler determines when a Thread runs

  • It uses Queues to keep track of what Threads are doing
    • Just like the OS and Processes
    • But it is implemented at User-Level in a Library

  • Run Queue: Threads currently running (usually one)

  • Ready Queue: Threads ready to run

  • Pending Queue: Threads waiting on a Condition until they can become Ready
    • e.g. Thread sleep()ing; placed there by Scheduler until a certain Interval Timer (OS-managed) expiration occurs, at which point it will move it back to Ready Queue

CS446/646 C. Papachristos

24 of 50

Threads

Kernel-Level vs User-Level Threads

Kernel-Level Threads

  • Good: Integrated with OS (informed scheduling)
    • Kernel knowing means that when one ThreadBlocks”, another one can be Scheduled
  • Bad: Slower to create, manipulate, synchronize

User-level Threads

  • Good: Faster to create, manipulate, synchronize
  • Bad: Not integrated with OS (uninformed scheduling)
    • Kernel doesn’t know means that when one User-Level ThreadBlocks”, all User-Level Threads in the Process will “Block”

Understanding their differences is important

  • Correct usage affects performance

CS446/646 C. Papachristos

25 of 50

Threads

Multiplexing User-Level Threads

Use both Kernel-Level and User-Level Threads

  • Associate (& Multiplex) User Threads with a Kernel Thread

  • A Thread Library is required to map User Threads to Kernel Threads

Big picture:

  • Kernel-Level Thread : Notionally represent Physical Concurrency – How many CPU tasks?
  • User-Level Thread : Notionally represent Application Concurrency How many App tasks?

Different mappings exist, representing different tradeoffs

  • One-to-One: One User Thread maps to one Kernel Thread
  • Many-to-One: Many User Threads map to one Kernel Thread, i.e. Kernel sees a single Process
  • Many-to-Many: Many User Threads map to many Kernel Threads

CS446/646 C. Papachristos

26 of 50

Threads

Multiplexing User-Level Threads

One-to-One ( 1 : 1 Threading )

  • One User-Level Thread maps to one Kernel-Level Thread

  • Good: More Concurrency
      • When one ThreadBlocks”, others can run
      • Better multicore / multiprocessor performance

  • Bad: Expensive
      • Thread operations involve Kernel
      • Threads need Kernel Resources
        • e.g. Memory

Note 1: The PTHREAD_SCOPE_SYSTEM contention scope typically indicates that a User-Space Thread is bound directly to a single Kernel-Scheduling entity. This is the case on Linux for the obsolete LinuxThreads implementation and the modern NPTL implementation, which are both 1:1 Threading implementations.�Linux supports PTHREAD_SCOPE_SYSTEM, but not PTHREAD_SCOPE_PROCESS.�

Note 2: There is no difference between Thread and Process in Linux. There only exists the concept of Thread Group Identifier (TGID) which allows to determine what is shared and what is not between created Threads.

CS446/646 C. Papachristos

27 of 50

Threads

Multiplexing User-Level Threads

Many-to-One ( n : 1 Threading )

  • Many User-Level Threads maps to one Kernel-Level Thread

  • Good: Fast
      • No System Calls required

Portable

      • Few system dependencies

  • Bad: No Parallel execution of Threads
      • All Threads “Block” when e.g. one waits for I/O

  • e.g. Java Virtual Machine (JVM) (also C#, others)
    • Java Threads are User-Level Threads
      • Can multiplex all Java Threads on one Kernel Thread

CS446/646 C. Papachristos

28 of 50

Threads

 

CS446/646 C. Papachristos

29 of 50

Threads

Multiplexing User-Level Threads

Two-Level

  • Similar to Many-to-Many

    • Except that a User-Level Thread can be bound �to a Kernel-Level Thread

CS446/646 C. Papachristos

30 of 50

Threads

 

CS446/646 C. Papachristos

31 of 50

Threads

Thread Miscellanea

  • Semantics of fork() System Call
    • Should fork() duplicate only the calling Thread or all Threads of a Process?
      • Think about other active Threads, or other Threads Trapped in another System Call, etc.

    • POSIX fork() copies only the calling Thread

      • Effectively, entire Memory is duplicated (including all the Registers), but the Child Process will only have one active Thread, i.e. only the calling Thread will be in a non-suspended State
        • i.e. other Threads are “there” but cannot run anymore as the system never assigns any CPU time to them; they are in fact missing in the Kernel Thread Scheduler Table

Note: Potentially dangerous to call fork() from a Multi-Threaded Process as any Mutexes (Synchronization Mechanisms, more on these later on…) held in non-calling Threads will be forever locked in the Child Process.

CS446/646 C. Papachristos

fork(2): A Process shall be created with a single Thread. If a Multi-Threaded Process calls fork(), the new Process shall contain a replica of the calling Thread and its entire Address Space, possibly including the states of Mutexes (more on these later…) and other Resources. Consequently, to avoid errors, the Child Process may only execute Async-Signal-Safe operations until such time as one of the exec functions is called.

32 of 50

Threads

Thread Miscellanea

  • Signal handling (Which Thread should Signals be delivered to?)

    • POSIX pthreads requires all Threads in a Process to:
      • Share some Process-wide attributes, including:
        • SignalDispositions” (how the Process behaves when it is delivered the Signal: Term, Ign, …)
      • Have some distinct attributes, e.g: Signal Mask, Alternate Signal Stack, etc.

    • Linux’s implementation will by default try to deliver to the Main Thread first, unless another Thread has been nominated by the User

CS446/646 C. Papachristos

POSIX.1 distinguishes the notions of Signals that are directed to the Process as a whole and Signals that are directed to individual Threads. According to POSIX.1, a Process-directed Signal (sent using kill() for example) should be handled by a single, arbitrarily selected Thread within the Process.

signal(7): A Signal may be generated (and thus pending) for a Process as a whole (e.g. when sent using kill()) or for a specific Thread (e.g. certain Signals, such as SIGSEGV and SIGFPE, generated as a consequence of executing a specific machine-language instruction are Thread-directed, as are Signals targeted at a specific Thread using pthread_kill()). A Process-directed Signal may be delivered to any one of the Threads that does not currently have the Signal blocked. If more than one of the Threads has the Signal unblocked, then the Kernel chooses an arbitrary Thread to which to deliver the Signal.

33 of 50

Threads

Non-Preemptive Thread Scheduling

  • Threads voluntarily give up the CPU by yield()ing

int sched_yield (void);

Causes the calling Thread to relinquish the CPU. The Thread is moved to the end of the Queue for its Static Priority (more on that in later Scheduling Lecture) and a new Thread gets to run.

Note: Strategic calls to sched_yield() (! where applicable !) can improve performance by giving other Threads a chance to run … Avoid calling sched_yield() unnecessarily or inappropriately (e.g., when resources needed by other schedulable Threads are still held by the caller), since doing so will result in unnecessary Context Switches, which will degrade system performance.

CS446/646 C. Papachristos

while (1) {

printf("ping\n");

sched_yield();

}

while (1) {

printf("pong\n");

sched_yield();

}

Ping Thread

Pong Thread

34 of 50

Threads

Non-Preemptive Thread Scheduling

  • Threads voluntarily give up the CPU by yield()ing

int sched_yield (void);

Causes the calling Thread to relinquish the CPU. The Thread is moved to the end of the Queue for its Static Priority (more on that in later Scheduling Lecture) and a new Thread gets to run.

  • In other words, it causes Context Switching to another Thread

  • So sched_yield() returns once we have Context-Switched back to the calling Thread
    • e.g. because another Thread called sched_yield()

Ping / Pong execution trace:

printf("ping\n");

sched_yield();

printf("pong\n");

sched_yield();

CS446/646 C. Papachristos

Note: sched_yield() is intended for use with Real-Time Scheduling policies (SCHED_FIFO, SCHED_RR, more on these later…). Its use of with non-deterministic Scheduling policies such as SCHED_OTHER is unspecified and very likely means your application design is broken.

35 of 50

Threads

Preemptive Thread Scheduling

  • Non-Preemptive Threads have to voluntarily give up CPU
    • A long-running Thread will take over the Machine
    • Only voluntary calls to sched_yield(), sleep(), or finishing, will lead to a Context Switch

Preemptive Scheduling

  • Causing involuntary Context Switching
    • Need to regain control of CPU Asynchronously
    • Use Timer Interrupt
    • Timer Interrupt Handler takes over control from current Thread, and can result in a new Scheduling decision that leads to a Context Switch

CS446/646 C. Papachristos

36 of 50

Threads

Thread Context Switching

Context Switching in the same Process, such that Virtual Memory Address Space is not switched

    • Which would otherwise involve Memory Address Mappings, Page Tables, and Kernel Resources

Note: Without additional considerations (e.g. Weak Affinity) Thread Context Switching can lead to Memory Pollution

The Context Switch Routine does all of the magic

  • Saves Context of the outgoing (current) Thread (PC, CPU Registers)
    • Push all machine State onto the top of the Kernel Stack of the outgoing Thread
  • Restores Context of the incoming (new) Thread
    • Pop all machine State from the Kernel Stack of the incoming Thread and loads it into CPU Registers
  • The incoming Thread becomes the current Thread
  • Return control to caller of the incoming Thread

All this has to be done in Assembly language

    • Works at the level of the Procedure Calling Conventions, so itself it cannot be implemented by using Procedure calls

CS446/646 C. Papachristos

37 of 50

Threads

Background: Calling Conventions

A standard on how functions should be implemented and called by the Machine

  • How a Function Call in C or C++ gets converted into Assembly language
    • How arguments are passed to a function, how return values are passed back out of a function, how the function is called, and how the function manages the Stack and its Stack Frame, etc.
  • Compilers need to obey this standard when compiling code into Assembly
    • Set up the Stack and CPU Registers properly

Why ?

  • A Program calls functions across many Object Files and Libraries
    • To be able to interface all of these, we need a standardization for how function calls take place

CS446/646 C. Papachristos

38 of 50

Threads

Background: Calling Conventions

x86 Calling ConventionStack setup

CS446/646 C. Papachristos

SP points to “bottom” of Stack (lower Virtual Addresses). As parameters are pushed on the Stack, the SP advances (downwards).

During a method call, variables are�pushed on the Stack and the SP advances more. Parameters passed to the subroutine constantly change their relative offset w.r.t. the SP.

We also need a FP that points to the start�(Base) of the Stack Frame and does not�move during the subroutine call.�Parameters passed to the subroutine remain at a constant offset w.r.t. the FP.

39 of 50

Threads

Background: Calling Conventions

Registers divided into 2 groups:

  • Caller-saved / Volatile Regs: Hold temporary quantities that need not be preserved across Calls,�i.e. their values aren’t assumed to be required after the next function call returns

    • Caller's responsibility to push these Registers onto the Stack or copy them somewhere else if it wants to restore these values after a procedure call
    • Considered normal for a call to “clobber” (“Call-Clobbered Regs”) temporary values in these Regs,�i.e. the Callee function is free to modify these
      • on x86, %eax [return val], %edx, %ecx

  • Callee-saved / Non-Volatile Regs: Caller expects these to hold the same value after call (Callee returns)

    • Callee’s responsibility to restore these to their original values before returning to the Caller, or to ensure it doesn’t touch them (“Call-Preserved Regs”)
      • on x86, %ebx, %esi, %edi, %ebp, %esp

CS446/646 C. Papachristos

40 of 50

Threads

Background: Calling Conventions

Registers divided into 2 groups:

  • Caller-saved / Volatile Regs:

    • Caller's responsibility to push these Registers onto the Stack or copy them somewhere else if it wants to restore these values after a procedure call

  • Callee-saved / Non-Volatile Regs:

    • Callee’s responsibility to restore these to their original values before�returning to the Caller (or to ensure it doesn’t touch them)

CS446/646 C. Papachristos

foo(): Save active Caller Registers

CALL foo_label (pushes PC)

Restore Caller Registers

Save used Callee Registers

… do stuff …

return: Restore Callee Saved Registers

RET (returns to Caller)

41 of 50

Threads

Remember: Pintos Thread Implementation

Pintos Thread Control Block (TCB) Structure:

CS446/646 C. Papachristos

struct thread {

tid_t tid;

enum thread_status status;

char name[16];

uint8_t *stack; /* Saved stack pointer. */

int priority;

struct list_elem allelem;

struct list_elem elem;

unsigned magic; /* Detects stack overflow. */

};

stack field is where the value of %esp (Stack Pointer (SP)) will�be saved when the Thread is Preempted

  • Note: In x86 Assembly we can’t write t->stack, we can instead take the�Address of the struct thread in memory and the offset (in bytes) of the field:

uint32_t thread_stack_ofs = offsetof(struct thread, stack);

Remember: Threads have no owned Heap

42 of 50

Threads

Example: Pintos Thread Context Switching

Pintos C declaration:

struct thread *switch_threads (struct thread *cur, struct thread *next);

  • Intuitively: To switch to another Thread, we just need to “switch the Stacks” using its saved stack field (because every other Thread ran switch_threads at the point it was Preempted itself)

    • Do so by changing the value of %esp

  • switch_threads is implemented in Assembly

  • “Bottom” of Stack after calling switch_threads:�Remember: Stack grows towards lower Virtual Addresses

CS446/646 C. Papachristos

(Remember: Calling Convention specifies pushing args (i.e. next, cur) for called function (i.e. switch_threads), then return address before making a call)

43 of 50

Threads

Example: Pintos Thread Context Switching

  • switch_threads first saves Callee-Saved Registers on Stack of current Thread (the one being Preempted)

CS446/646 C. Papachristos

pushl %ebx #base address (in Data Segment - DS)

pushl %ebp #current frame base pointer

pushl %esi #sequential access instr. source index (in DS)

pushl %edi #sequential access instr. destination index (in ES)

Save Callee-Saved Regs of the current Thread

44 of 50

Threads

Example: Pintos Thread Context Switching

  • switch_threads loads the value of the current Thread’s (the one being Preempted) Stack Pointer via accessing the thread_stack_ofs into the %edx Register, as we will need it to perform switching

CS446/646 C. Papachristos

thread_stack_ofs is the offset of stack field in thread struct

pushl %ebx

pushl %ebp

pushl %esi

pushl %edi

mov thread_stack_ofs, %edx

45 of 50

Threads

Example: Pintos Thread Context Switching

  • switch_threads saves the Stack Pointer of the current Thread, and sets %esp to point to the (previously saved) Stack Pointer of the next Thread to run

CS446/646 C. Papachristos

movl SWITCH_CUR(%esp), %eax %eax = cur

movl %esp, ( %eax , %edx ,1) cur -> stack = %esp

movl SWITCH_NEXT(%esp), %ecx %ecx = next

movl ( %ecx , %edx ,1), %esp %esp = next -> stack

pushl %ebx

pushl %ebp

pushl %esi

pushl %edi

mov thread_stack_ofs, %edx

SWITCH_CUR(%esp) is equivalent to 20(%esp), or address esp+20, which holds the call arg cur

SWITCH_NEXT(%esp) is equivalent to 24(%esp), or address esp+24 , which holds the call arg next

Save Stack Pointer of the current Thread

Load Stack Pointer of the next Thread

46 of 50

Threads

Example: Pintos Thread Context Switching

  • switch_threads has effectively now switched Threads (%esp). It finally restores Callee-Saved Regs it had pushed onto the next Thread’s Stack at the time that one was Preempted, and returns from the call to

switch_threads into the next Thread’s frame (at the time that itself�previously had called switch_threads)

CS446/646 C. Papachristos

pushl %ebx

pushl %ebp

pushl %esi

pushl %edi

mov thread_stack_ofs, %edx

movl SWITCH_CUR(%esp), %eax

movl %esp, (%eax,%edx,1)

movl SWITCH_NEXT(%esp), %ecx

movl (%ecx,%edx,1), %esp

popl %edi

popl %esi

popl %ebp

popl %ebx

ret

Restore Callee-Saved Regs of the next Thread

return from switch_threads() in the next Thread (call happened when it was Context-Switched)

At this point esp has changed to the next Thread’s Stack

47 of 50

Threads

Example: Pintos Thread Context Switching

  • switch_threads in overview:

CS446/646 C. Papachristos

pushl %ebx

pushl %ebp

pushl %esi

pushl %edi

mov thread_stack_ofs, %edx

movl SWITCH_CUR(%esp), %eax

movl %esp, (%eax,%edx,1)

movl SWITCH_NEXT(%esp), %ecx

movl (%ecx,%edx,1), %esp

popl %edi

popl %esi

popl %ebp

popl %ebx

ret

48 of 50

Threads

 

CS446/646 C. Papachristos

49 of 50

Next Lecture Reading Preparation

Operating Systems Three Easy Pieces (https://pages.cs.wisc.edu/~remzi/OSTEP/)

Concurrency

  • 26. Concurrency and Threads
    • 26.3 Why It Gets Worse: Shared Data
    • 26.4 The Heart Of The Problem: Uncontrolled Scheduling
    • 26.5 The Wish For Atomicity
  • 27. Thread API
    • 27.3 Locks
  • 28. Locks
    • Beginning of Chapter
    • 28.1 Locks: The Basic Idea
    • 28.2 Pthread Locks
    • 28.3 Building A Lock
    • 28.4 Evaluating Locks

CS446/646 C. Papachristos

  • 28.5 Controlling Interrupts
  • 28.6 A Failed Attempt: Just Using Loads/Stores
  • 28.7 Building Working Spin Locks with Test-And-Set
  • 28.8 Evaluating Spin Locks
  • 28.9 Compare-And-Swap

50 of 50

Time for Questions !

CS-446/646

CS446/646 C. Papachristos