Cloud Foundry requirements on libcontainer
Pivotal spent around six person months in mid-2014 attempting to make libcontainer suitable as a basis for implementing the garden-linux backend. Some of the Cloud Foundry requirements on libcontainer are described in this blog and are discussed below.
Some progress towards meeting these requirements was made, but substantial gaps and uncertainties remained at the end of the process and so further development was deferred and garden-linux is currently being redesigned. garden-linux is, however, reusing individual libcontainer packages, such as netlink, as appropriate. Meanwhile, the libcontainer API, including some of the changes proposed by Pivotal, is still largely unimplemented.
It is important to note that both libcontainer and garden-linux are based on a common set of Linux kernel primitives. It appears that these primitives are the appropriate level for general reuse between the projects and provide suitably compatible behaviour.
It is also important to note that garden-linux supports the use of Docker images as root file systems and this is implemented by calling Docker’s graphdriver package rather than libcontainer. This document focuses on requirements on building containers for Cloud Foundry using libcontainer.
libcontainer was originally coded for the Docker use case of running a single user process in a container. The user process was the root of the container’s process hierarchy. The lifetime of the container was defined by the lifetime of the user process.
Cloud Foundry runs one or more user processes in a container with no single user process being the root of the process hierarchy.
During the development of a new API for libcontainer, this issue was visited on a number of occasions. The closest accommodation to Cloud Foundry’s requirement that seemed likely was to enable the initial process of a container to be specified in the container’s configuration. There were a number of outstanding questions about this approach including what would be the contract for reaping exit status of (other) user processes which the initial process would need to satisfy.
Cloud Foundry uses a UNIX socket to communicate between the host and the initial process of a container. The file system containing the socket is unmounted in the container, which prevents access to it from other processes in the container. Also file descriptors, including those for process standard output and standard error streams, are transmitted to the host over the socket as out of band data. libcontainer has no support for managing such a socket.
When Cloud Foundry’s requirements were being discussed, it appeared that libcontainer did not support proper nesting of containers which Cloud Foundry needs for the micro BOSH development environment. In particular, union file systems seem to have functional restrictions when nested. This was summarised in issue 180, which is still open in spite of claims that nested Docker works properly.
Cloud Foundry needs to update a container’s configuration dynamically. The lack of this feature in libcontainer is covered by issue 182.
Cloud Foundry needs to be able to restart its components gracefully. For instance, it is sometimes necessary to shutdown and restart the garden server which manages a collection of containers without reprovisioning the containers or losing data from a container’s standard output and standard error streams. garden-linux implements this by snapshotting container state at shutdown and recovering the snapshotted state during restart and by using separate processes to monitor a container’s standard output and standard error. libcontainer does not support this.
Events appeared in early proposals for the libcontainer API and are still required by Cloud Foundry, particularly when a container hits OOM. There is no sign of events being added to the libcontainer API, so this is covered by issue 176.
Issue 178 covers the need to test libcontainer code in the presence of errors. This is necessary to make libcontainer robust and predictable when errors occur, which is essential in a PaaS. Pivotal made a start in this area, but no interest has so far been shown by other libcontainer contributors in taking this further.
Various issues were raised for serviceability improvements such as error identification and logging configuration. Concerns were also expressed about the serviceability of the libcontainer code once its core has been replaced by libct (which is written in C), for instance, the difficulty of obtaining sufficient error context, e.g. stack traces, for errors occurring in libct and of integrating logs from the Go portion of libcontainer and the libct core.
 The PR is now quite hard to follow because of subsequent commits involving the same files. This is an unfortunate consequence of the use of PRs to propose design changes.