Linux Apps

What we are trying to do:

  1. Make it easier for app developers to package and ship apps to customers
  2. Increase stability of our APIs (by filtering them by stability)
  3. Minimize test matrix, i.e. deal with the wildly different distributions
  4. “Fat Apps”, i.e. apps for multiple architectures
  5. One-click app execution/removal
  6. Increased security by sandboxing all code
  7. Stand-alone apps as well as app and host OS “extensions”

We believe for a full Linux App solution we need six big changes:

  1. A Sandboxing logic based on Namespaces, CGroups, Seccomp, Capabilities
  2. An app image solution based on loopback images with GUID partition tables and squashfs file systems
  3. Kernel-based D-Bus with per-namespace policies and usefulness for exchanging payload and control
  4. A “portal” system to do proper privilege separation for choosing files, accessing devices and resources
  5. A sandbox-aware display manager (Wayland?)
  6. A solution for the “Search Path Issue”

We accept the following changes from the traditional application distribution model:

  1. Users get applications directly from vendors, the role of the distributions is diminished to only supply the core OS
  2. Libraries may be bundled. Security updates for these libraries are hence something the app vendor needs to care about and is responsible for
  3. Different applications are updated in different speeds. Games get updated less frequently than Web Browsers, and hence should be able to use a reduced set of stabler APIs at the price of OS integration, while web browsers use less stable APIs but get closer OS integration for this.
  4. API stability is controlled much less along the lines of individual libraries, but more around full multi-library profiles. i.e. And app no longer just declares compatibility with Gtk+ 3.16, but rather with GNOME 3.4 which covers substantially more ground and multiple libraries at once (gtk, glib, gst, …)

Issues and deficiencies:

  1. OS vendors cannot apply security fixes for apps
  2. Bundled libraries might exist many times in memory
  3. The added trust distro vendors provide is lost
  4. Stable host APIs cannot transparently hide their use of unstable APIs
  5. Applications which are interested in maximum compatibility, will only get a minimum of stable APIs guaranteed (only kernel ifaces in the extreme case), which breaks things such as UI themes and plugins.

Outlook:

  1. Many of the issues relevant for user apps are also interesting for webapps, i.e. on the server. Sandboxing, one-click install for webapps such as Wordpress is highly desirable.

General Ideas

Apps run inside a classic Linux file system hierarchy based on the FHS. However, they generally see their own hierarchy that only shares a few selected files and directories with the host hierarchy. In fact, app images might frequently include binaries/libraries that are compiled by the OS vendor of the OS the developer used when developing the app, rather than the local counterparts.

Apps come with manifest files that declare:

  1. The security policy an app requires
  2. One or more API profiles the app is intended for. i.e. an app could declare that it requires “GNOME OS 1.0” which makes all interfaces and resources that the GNOME project considered stable enough for “GNOME OS 1.0” available, but nothing else. For the beginning we’d expect three profiles: “BARE” (only kernel interfaces included, app needs to ship glibc), “SYSTEM” (libc, libm, and a few equally low-level libs included), “GNOME 1.0” (all stable GNOME libs included).

Sandboxing Model

Our sandbox security is built on visibility of resources, not access control to resources. Instead of simply limiting access to OS objects, we want to not have these OS objects be visible to the apps in the first place, and ensure they cannot be made visible.

Current idea:

When a sandbox is initialized a file system namespace is set up. A whitelist of files and directories is then used to bind mount all necessary resources into the namespace. It is essential that only files/directories from the host OS are made available to the sandbox that are vetted and considered stable enough for the selected API profile for the app. More specifically, if an app selects “GNOME OS 1.0” as app profile, then no libraries that aren’t considered stable enough for “GNOME OS 1.0” show up in the namespace, and that includes all third party libraries. i.e. when you select “GNOME OS 1.0” as app profile, then you will not see the Qt or WINE APIs, and suchlike, even if those are installed in the host OS.

Only selected interface points to the outside are made available beyond mere sharing of libraries and other files:

  1. Access to the system and user bus, see “Kernel D-Bus” below, by making the respective devices nodes available in the sandbox.
  2. Access to the display manager. This needs to be hashed out. X11 is too complicated a protocol  to enforce access control on, so we’d assume that in the beginning apps will either get full access to X11 or none, and later on we close this gaping hole via adoption of Wayland which would allow for a much saner, simpler access model. Access to X11 is done simply via mounting the X11 socket into the sandbox.
  3. Access to nscd to enable NSS. We will not allow loading arbitrary NSS modules, instead will just make the nscd socket available.

Sandboxing should be independent from app images (see below). The sandboxing logic should probably be available as shared library so that not only executed apps can be locked into a sandbox but also software such as a web browser, PDF or image viewer can lock their rendering/javascript components into sandboxes.

App Images

Applications shall be shipped in single-file app images.

Idea: app images consist of a disk image consisting of a GUID partition table with a number of squashfs file systems inside. For each architecture that shall be supported a file system partition is included, plus one for the generic arch-independent bits.

Before an app is executed it needs to be mounted. Mounting consists of mounting the arch independent partition from the app image, then bind mounting the files from the right arch dependent partition, and finally mounting the files to pull in from the host. Applications in a special folder ~/Applications are automatically mounted, just by dropping them in there. Other applications can be mounted explicitly.

While in the usual case a mounted app when executed will result in one sandbox to be set up there are app images where this is not done, for example to allow extensions. Think: an MP3 codec  image should be able to extend the host gstreamer. Or a Flash image should be able to extend the Firefox image.

Sandboxes may hence exist independently of app images, multiple app images might live in the same sandbox, and sandboxes might exist without any app images backing them.

Kernel D-Bus

We believe D-Bus should be the primary way in and out of the sandbox. This requires sandbox-specific access control on bus services, as well as improvements in D-Bus to make it performant not only for exchanging control message but also for exchanging payload data (i.e. we want D-Bus to be good enough to return a JPEG from the camera “portal” back to the requester without jumping through hoops with external files/fds). To reach these goals securely and efficiently we believe kernel-based D-Bus is essential. We want kernel-enforced policies and a zero-copy design.

Kernel D-Bus has been attempted twice and failed twice with a lot of noise. To make it succeed the third time, we need to alter our approach. Hence: ensure kernel dbus does not touch the core kernel, does not require any socket families/functionality registered in the kernel proper, but can be a kmod that only consumes but never provides/alters existing kernel interfaces

Suggested design: the new kernel-based D-bus will be built on top of

kernel character device nodes:

  1. /dev/kdbus/control → is a kernel character device which may be used to create and remove busses, as well as create additional (sub-)namespaces
  2. /dev/kdbus/system/bus → a kernel character device which is used as primary entry point to the system bus. A single bus may have more than one entry point, with different access policies implied.
  3. /dev/kdbus/1000-user/bus → a kernel char device which is used as primary entry point for the user bus of user with uid=1000
  4. /dev/kdbus/ns/mydebiancontainer/control → a master device for a subsandbox called “mydebiancontainer” of the host. The sandbox will bind mount the host’s /dev/kdbus/ns/mydebiancontainer/ to /dev/dbus, so that the device nodes of the host and of other sandboxes are invisible, and the sandbox-private device directory appears as the only one accessible.
  5. /dev/kdbus/ns/mydebiancontainer/system/bus → the main system bus entry point of the “mydebiancontainer” sandbox.

Control devices shall understand the following ioctls:

DBUS_CMD_BUS_CREATE → creates a new bus, and an initial entry point device node for it. This will create the first entry point device node and it will be owned by the invoking uid/gid

DBUS_CMD_BUS_REMOVE → removes a previously created bus, tears down all of the namespaces and endpoints associated with that bus.  The “Master” bus can not be removed.

DBUS_CMD_NS_CREATE → creates a new namespace for usage in sandboxes. This will result in a new master device node being created in a subdirectory of /dev/dbus.

DBUS_CMD_NS_REMOVE → removes a previously created namespace.  All endpoints created for that namespace will be removed.  The “Master” namespace can not be removed.

Bus EP devices shall understand the following ioctls:

DBUS_CMD_EP_CREATE → creates a new EP for an existing bus

DBUS_CMD_EP_REMOVE → removes a bus entry point. If all EPs of a bus are gone the bus itself is removed too.

DBUS_CMD_EP_POLICY_SET → install a new access policy into this EP. Once an access policy is set it cannot be changed. Policies are simple per-service access lists.

DBUS_CMD_MSG__SEND → Sends a previously allocated message to a bus. This call takes a flag to optionally free the allocated message, i.e. imply a DBUS_FREE

DBUS_CMD_MSG_RECV → Receives a message. This returns a valid pointer to user memory. Ideally, this refers to a COW copy of the message in the senders memory

DBUS_CMD_NAME_ACQUIRE → Acquires a well-known service name for the open bus fd

DBUS_CMD_NAME_RELEASE → Releases a well-known service name for the open bus fd

DBUS_CMD_NAME_LIST → Returns a list of all currently registered unique and well-known names

DBUS_CMD_MATCH_ADD → Adds a filter for non-directed messages to the open bus fd. By default a service will not receive any non-directed (i.e. broadcast) messages, such as signals.

DBUS_CMD_MATCH_REMOVE → Inverse of DBUS_ADD_MATCH

Bus activation is done in socket-activation style: systemd opens the bus device, allocates the service name and then passes on this fd to activated services on activation.

All messages implicitly carry UID/PID/GID, timestamp of the sender (possibly more, such as audit info).

System access policy is always installed on the client side, merging configuration from /etc and per-sandbox configuration.

Here’s an example of the /dev/kdbus/ layout:

 /dev/kdbus/
|-- control
|-- system
|   |-- bus
|   |-- ep-epiphany
|   `-- ep-firefox
|-- 2702-user
|   `-- bus
|-- 1000-user
|   `-- bus

 `-- ns
    |-- myfedoracontainer
    |   |-- control
    |   |-- system
    |   |   `-- bus
    |   `-- 1000-user
    |       `-- bus
    `-- mydebiancontainer
       |-- control
       `-- system
           `-- bus

Portals

Applications should always run in sandboxes of minimal privilege. Part of that is that even though a word processor should be capable of opening arbitrary files the user picks, it should not get access and see all files without user intervention. To handle this problem we’d like to see a system like Android’s “Intents” to be adopted in GNOME. The idea is basically, that apps no longer implement operations such as “pick file”, “take photo” directly in their apps, but rather leave this to a “portal” provider, which lives outside of the sandbox, and runs with different privileges, and requires user interaction. e.g.: if the user clicks “Open” in the word processor, it should simply tell the system that the user wants to open a file. The system would then show a file selection UI, allow the user to pick a file and then return the file contents. At no time the app should be allowed to directly search for the file or do so without user interaction involved. Portals are primarily a security feature (since they are basically a security domain transition), but double as integration point for the OS.

Suggestion would be to simply define a D-Bus service interface that allows registration of “portal” handlers via bus names, and provides not much more than a single method that executes the desired operation and returns the data as payload.

The Search Path Problem

When an app image is mounted (see above) its contents needs to be made available to the host. Examples: the .desktop file and the app icons need to be discovered by gnome-shell. Bus activation files need to be discovered by D-Bus. dconf schemata need to be found by dconf. Documentation should be discovered by the help browser. And so on. Extension packages need to be discoverable by the apps they extend: i.e. firefox needs to be able to find the newly installed flash plugin, gstreamer the newly installed MP3 codec, gvfs the new network file system.

Currently, various different implementations and specifications for a file search logic are established. Some packages only watch a single fixed directory for drop-ins, others honour an env var (such as $PATH), even others implement XDG basedir or something like it. But all these implementations suck in many ways: no common scheme is followed, frequently do not allow live changes, or require that all resources reside in the same dir, so that clean separation of OS and app data is diluted.

We believe to make apps feel natural and at home we need to clean the Search Path logic. Our idea is to implement a library that extends XDG basedir:

  1. Deals with binary plugins, on top of the non-binary XDG basedir stuff already supported.
  2. Allows dynamic search path changes and notifications (i.e. if an app is mounted, gnome-shell needs to be notified instantly that new .desktop files are now available)
  3. Handles statically installed apps in /usr and /usr/local, the same ways statically installed apps in /opt, as well as app-image based ones.
  4. Should provide a way to exclude/include search paths. For example, the admin might want to exclude certain apps in /opt from the search paths for specific resource types.
  5. Abstract the gory details of this a bit, so that we can extend this later on.

Fixing the search path issue is important in the apps context, but much further beyond that, as a lot of server software suffers by the same issues. Think Apache or PHP modules and so on. At the moment systems such as Red Hat “Stacks” try to work around these issues by patching env vars, symlinking stuff from /opt into /usr, and similar ugly things. We believe fixing the search path issue properly, so that env var patching or symlinking things into /usr is unnecessary, is highly desirable across the whole stack.

To Do

  1. systemd: container/image stuff
  2. gnome: portal logic
  3. xdg basedir needs update for binary plugins
  4. glib: slightly nicer xdg basedir with inotify
  5. systemd,dbus,gdbus: update for kdbus
  6. gnome-session: monitoring of app folder and expansion, updating indexes
  7. build tool based on rpm/deb+...
  8. gnome: define profile: get people to subscribe to compat
  9. gnome: downstream packaging policy