1 of 22

Concurrency in Chromium:

cfredric@

How I learned to stop worrying and love Sequences

2 of 22

Goals

Give intuition on Chromium's concurrency model
Show some useful tips/tricks for working with concurrency in Chromium

Non-goals:

Explain how things are implemented, in depth
Show all APIs related to concurrency

3 of 22

Agenda

What's the problem?
Chromium's solution

Vocabulary
Guts
Usage patterns

4 of 22

Chromium's Architecture

Chromium consists of multiple processes:

Browser process, renderer processes, utility processes (network process, data decoder process, etc.)

Each process consists of multiple threads:

Main thread (also called UI thread in browser process)
IO thread (for IPC, not file/network IO)
Other special purpose threads
A pool of general-purpose threads

As most of you probably know, Chromium is a multi-process application. It consists of the browser process, many renderer processes, some utility processes, and probably more. This is for a few reasons:

Stability: one renderer crashing won't bring down every other tab.
Security: the browser process is privileged, but renderer processes are very locked-down and can't do much without talking to the browser process. Attackers would have to exploit the renderer and the browser process in order to be able to do things that require privileges.
Speed: different processes can run on different CPU cores simultaneously; no need to have one web page get slow just because a different one is doing heavy computation.

(Astute listeners will note that the only S missing from the 4 Ses of https://www.chromium.org/developers/core-principles/ is Simplicity.)

On top of that, each process consists of multiple threads. There's the main thread (which is also called the UI thread if you're talking about the browser process), the IO thread (which is for inter-process communication, not file or network IO), some other special purpose threads, and a threadpool of general purpose threads.

All this is to say that Chromium is heavily parallelized. Many different processes and threads are all running concurrently and in parallel, and have to do so safely and correctly.

###

Other special purpose threads:

Worker threads (web workers)
Media thread
Script-streaming threads

5 of 22

Intra-process Parallelism (in general)

All threads of a given process share the same address space (modulo thread-local storage [TLS]).

How can threads avoid data races, in general?

Multiple approaches:

Access memory from multiple threads simultaneously

"Communicate by sharing memory"
Must use mutexes, condvars, etc. to ensure safety

Send data between threads, without sharing memory

"Share memory by communicating"
Must use message-passing between threads

Hybrid

What do we mean by "safely and correctly"? If we weren't careful, we'd end up with data races between threads. Different threads (of the same process) can access the same memory, so we need to make sure that they don't trample all over each other, and instead play nicely together.

In general, there are a few solutions to this problem. The first is use mutexes, condition variables, semaphores, barriers, and other primitives so that multiple threads can access (write) the same data, but do so at different times. This is the "classic" approach, that most of us are probably familiar with. The tagline for this is to have the threads "communicate by sharing memory".

Another approach is to use communication primitives to send data between threads, so that only one thread has the data at a time. The tagline here is to have the threads "share memory by communicating".

Of course, these approaches aren't mutually exclusive (so to speak), so they can be combined into a hybrid.

###

Data Race:

Two or more threads concurrently accessing a location of memory
One or more of them is a write
One or more of them is unsynchronized

(definition from https://doc.rust-lang.org/nomicon/races.html)

6 of 22

Intra-process Parallelism in Chromium

Chromium uses the hybrid approach, with a strong preference for message-passing:

Send data and tasks between threads, instead of using locks to synchronize.
Locks/condition variables exist, but are rarely needed.

7 of 22

The End

8 of 22

The End

9 of 22

Why not stop here?

Threads are too coarse-grained & heavy-weight
Chromium has many independent streams of work to do at a given time

Need a way to take independent streams of work and load-balance them between threads

10 of 22

Chromium's concurrency vocabulary

Task: a basic unit of work.

Think OnceCallback and RepeatingCallback.

Physical thread: an OS thread.

Think pthreads on POSIX.

base::Thread: Chromium's abstraction over physical threads.

Platform-agnostic.

Sequence: a "virtual thread"; a "stream of work".

An environment that executes a series of tasks in order.
Not associated with any particular physical thread.

Now that we know what problem we're solving, let's introduce some vocabulary so that we can talk about Chromium's solution.

First: a task. A task is a basic unit of work, that can be executed by a thread. In concrete terms, think a OnceCallback or RepeatingCallback.

Next, physical thread. This is just what it sounds like, a thread that's managed by the operating system, e.g. pthreads on a POSIX system. You'll likely never have to interact with this directly.

Next, base::Thread. This is Chromium's platform-agnostic abstraction over physical threads. You'll also never have to deal with these directly.

Finally, a sequence. This is the word Chromium uses instead of "workstreams" from the previous slide. You can think of a sequence as a "virtual thread" that may be executed on any underlying physical thread, and has some queue of tasks that it executes in order.

###

Note: as a Chromium dev, you will almost never have to use a base::Thread directly. Most of the time, you can (and should) use base::ThreadPool instead.

11 of 22

How does Chromium execute Sequences?

❌ One thread : one Sequence

Idea: make a new thread to handle each Sequence
Too much overhead

❌ One thread : many Sequences

Idea: each thread owns a set of Sequences that it executes
Hard to load-balance

✅ Many threads : many Sequences

Idea: threads share Sequences, pick one to execute when scheduling the next task
Can "move" a Sequence from a busy thread to an idle one => easy to load-balance
Doesn't require large number of threads (good for low-end devices)

Using better vocabulary this time, let's look at how Chromium does that load-balancing.

There are a few approaches here, but it turns out that only one of them is viable.

The simplest idea is to make Sequences and threads in a one-to-one correspondence with each other. That would work in theory, but in practice it would create a huge number of threads (which is expensive in itself) - and we'd certainly hit per-process thread limits imposed by the OS.

Another idea is to have each thread own a set of Sequences that it's responsible for executing, and just switch between those. That approach would be feasible, but it doesn't let us do load-balancing, since the Sequences are firmly locked to their thread.

The remaining approach is the "many to many" relationship, between sequences and threads. In other words, a given thread may run many different sequences over time, and a given sequence may run on many different threads over the course of its lifetime.

This solves both of the problems with the other approaches: we can move Sequences from one thread to another in order to do load balancing, and we don't have to spin up a huge amount of threads either.

12 of 22

Why are we here, again?

Started by discussing safe concurrent programming
Got sidetracked about efficient concurrent programming

ignored how to make it safe, oops

13 of 22

How to use Sequences safely?

Goal: use the properties of Sequences to protect against data races

Know: data races occur if data is accessed by more than one thread at a time
Know: tasks from a given Sequence can execute on only one thread at a time

=> If all the code that accesses an object is on the same Sequence, it's impossible to have a data race involving that object
=> Want something to ensure that whenever we access an object, we do so from a consistent Sequence

SEQUENCE_CHECKER is built for this!
GUARDED_BY_CONTEXT makes it impossible to forget to do this check (fails at compile-time).
More flexible than ThreadChecker, since it doesn't care what physical thread it's on.

We know that data races only happen if the data is accessed by more than one thread at a time.

And we know that tasks from a given Sequence can execute on only one thread at a time.

So, if we ensure that all the tasks that access a given piece of data are from the same sequence, then we've guaranteed that only one will execute at a time, so there are no data races (for that piece of data).

And it turns out that this is exactly what SEQUENCE_CHECKER is for - it just ensures that whenever we access the data protected by the GUARDED_BY_CONTEXT macro, we do so always from the same Sequence.

As a general rule, most chromium code is fine using a single sequence, rather than a single thread. Writing your code to use a single sequence is more flexible than a single thread, since sequences can be run by arbitrary threads - so it can be faster and lead to better scheduling overall.

###

Use case for thread-affine code: need to access something in TLS, or to use some 3rd-party API that is thread-affine.

14 of 22

Sequences, visualized

UI

Task

History::GetHistory...

Worker

Task

Worker

Task

History::GetHistory...

Image credit: Life of a Process, Chrome U 2019

15 of 22

Sequence internals

A class that is:

A TaskSource

Provides stream of tasks to threading infrastructure.

And has:

A SequenceToken

Wrapper around an int.
Each instance gets a unique token.

A SequenceLocalStorageMap

Like thread-local storage, but for Sequences

Now that we have some intuition for how we want a Sequence to behave, let's dig in to some implementation details, to see how Chromium makes that happen.

Conceptually, you can think of a Sequence as being a queue of tasks that get run in order. Everything else is just the plumbing needed to make that happen, and the metadata to assert that it happened the way we wanted it to.

More concretely, a Sequence *is* a TaskSource for the scheduling infrastructure, and it has a unique identifier, called a SequenceToken.

Finally, there's one more piece. When working with threads, there's a concept of thread-local storage, so that each thread can have its own memory even though they all share an address space. Chromium supports the analogous idea for Sequences, i.e. sequence-local storage. So the final piece is a place to store that data, the SequenceLocalStorageMap.

16 of 22

How does the infra use Sequences?

Scheduler ensures that a Sequence only executes one task at a time.
Before a Sequence's next Task is executed, its SequenceToken and SequenceLocalStorage are put into TLS.

=> each thread has a unique "currently running Sequence"

17 of 22

Who creates Sequences?

Sequences are integrated in ThreadPool/TaskRunner infrastructure
Sequences get automatically created by:

base::ThreadPool::Post[Delayed]Task
base::Create[Updateable]SequencedTaskRunner
base::CreateSingleThreadTaskRunner

18 of 22

How do I send a task from my Sequence to another?

I don't care what Sequence I use:

ThreadPool::Post[Delayed]Task (creates a new Sequence)

To a specific sequence:

SequencedTaskRunner::Post[Delayed]Task
SingleThreadTaskRunner::Post[Delayed]Task

SequenceBound<T> can help call methods/ctor/dtor on a specific sequence.

Ok, so lots of Chromium code checks that it gets run on specific sequences. How do I get to those sequences from whatever sequence I'm currently on, in order to use that code safely?

To do that, you need to post a task to that sequence. To do so, you have a couple options:

If it can be any arbitrary sequence, use ThreadPool::PostTask, and you'll get a new sequence.
If it must be a particular sequence that you already know, use the appropriate SequencedTaskRunner or a SingleThreadTaskRunner. (You might need the single-threaded task runner if you're working with 3p libraries that don't work well with sequences, or use TLS.)
If you have an object that owns and manages another object whose methods must all be called on a different sequence, you can use SequenceBound to make those hops easier.

19 of 22

How do I run tasks on "my" Sequence?

Run a task on some other sequence, then come back:

TaskRunner::PostTaskAndReply[WithResult]
ThreadPool::PostTaskAndReply[WithResult]

Run something on "my" sequence, asynchronously:

SequencedTaskRunnerHandle::Get()->Post[Delayed]Task

20 of 22

I don't know what Sequence I need to run on!

You might not have to do anything!

Often APIs implicitly use sequences properly.
E.g. mojo::Receiver::Bind by default schedules message events on the sequence that called Bind.

21 of 22

References

22 of 22

Appendix

Jobs (post_job.h)

Power-user API, for bulk-processing with minimal scheduling overhead