Linux Primitives
Nati Cohen (@nocoot)
Avishai Ish-Shalom (@nukemberg)
What we know as Linux is actually GNU user space
Our apps interact with GNU libc and other userspace libraries
Linux containers
What’s on the menu tonight
Processes - Data Structures
Under the hood both threads and processes are tasks
Processes - fork & exec
Traditionally *nixs created new processes using:
Users
From the kernel’s PoV, a user is an int parameter in various structs
Capabilities
Traditionally, UNIX is a monotheistic O/S: you are either god or mortal
Mounts
Map a file system to a directory
chroot(new_root)
Change the root directory for a process
pivot_root(new_root, put_old)
Change the root directory for all processes in current mnt namespace
CoW storage
General idea: start with a common read branch, writes go to a different branch
cgroups
CGroups control, account and limit system resources
Namespaces
Why?
Namespaces
The namespaces API:
Namespaces
struct nsproxy {� atomic_t count;� struct uts_namespace *uts_ns;� struct ipc_namespace *ipc_ns;� struct mnt_namespace *mnt_ns;� struct pid_namespace *pid_ns_for_children;� struct net *net_ns;�};
For PID namespace task_active_pid_ns() finds ns using pid and upid structs
PID namespace - How
struct upid {
/* Try to keep pid_chain in the same cacheline as nr for find_vpid */
int nr;
struct pid_namespace *ns;
struct hlist_node pid_chain;
};
struct pid
{
atomic_t count;
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
struct rcu_head rcu;
struct upid numbers[1];
};
Before PID namespace
struct pid { � atomic_t count;� int nr;� struct hlist_node pid_chain;� struct hlist_head tasks[PIDTYPE_MAX];� struct rcu_head rcu;�};
"We can solve any problem by introducing an extra level of indirection."
New
New
PID namespace behaviour
PID namespaces form a hierarchy.
UID/user namespace
Map uid and gid numbers
Memory management
What happens when memory runs out?
OOM Killer heuristics find fattest, laziest SOB and kills it.
setrlimit(resource, rlim)
Traditional approach for per-task memory limiting (aka ulimit)
Memory Control Group
More info in the Linux kernel documentation for Memory Resource Controller
Memory Control Group - OOM
What happens when we allocate too much memory?
References
Glibc and the kernel user-space API [LWN]
Virtual File System [Kernel Docs]
Shared Subtrees [Kernel Docs]
Docker from Scratch workshop repo