What we are trying to do:
We believe for a full Linux App solution we need six big changes:
We accept the following changes from the traditional application distribution model:
Issues and deficiencies:
Apps run inside a classic Linux file system hierarchy based on the FHS. However, they generally see their own hierarchy that only shares a few selected files and directories with the host hierarchy. In fact, app images might frequently include binaries/libraries that are compiled by the OS vendor of the OS the developer used when developing the app, rather than the local counterparts.
Apps come with manifest files that declare:
Our sandbox security is built on visibility of resources, not access control to resources. Instead of simply limiting access to OS objects, we want to not have these OS objects be visible to the apps in the first place, and ensure they cannot be made visible.
When a sandbox is initialized a file system namespace is set up. A whitelist of files and directories is then used to bind mount all necessary resources into the namespace. It is essential that only files/directories from the host OS are made available to the sandbox that are vetted and considered stable enough for the selected API profile for the app. More specifically, if an app selects “GNOME OS 1.0” as app profile, then no libraries that aren’t considered stable enough for “GNOME OS 1.0” show up in the namespace, and that includes all third party libraries. i.e. when you select “GNOME OS 1.0” as app profile, then you will not see the Qt or WINE APIs, and suchlike, even if those are installed in the host OS.
Only selected interface points to the outside are made available beyond mere sharing of libraries and other files:
Applications shall be shipped in single-file app images.
Idea: app images consist of a disk image consisting of a GUID partition table with a number of squashfs file systems inside. For each architecture that shall be supported a file system partition is included, plus one for the generic arch-independent bits.
Before an app is executed it needs to be mounted. Mounting consists of mounting the arch independent partition from the app image, then bind mounting the files from the right arch dependent partition, and finally mounting the files to pull in from the host. Applications in a special folder ~/Applications are automatically mounted, just by dropping them in there. Other applications can be mounted explicitly.
While in the usual case a mounted app when executed will result in one sandbox to be set up there are app images where this is not done, for example to allow extensions. Think: an MP3 codec image should be able to extend the host gstreamer. Or a Flash image should be able to extend the Firefox image.
Sandboxes may hence exist independently of app images, multiple app images might live in the same sandbox, and sandboxes might exist without any app images backing them.
We believe D-Bus should be the primary way in and out of the sandbox. This requires sandbox-specific access control on bus services, as well as improvements in D-Bus to make it performant not only for exchanging control message but also for exchanging payload data (i.e. we want D-Bus to be good enough to return a JPEG from the camera “portal” back to the requester without jumping through hoops with external files/fds). To reach these goals securely and efficiently we believe kernel-based D-Bus is essential. We want kernel-enforced policies and a zero-copy design.
Kernel D-Bus has been attempted twice and failed twice with a lot of noise. To make it succeed the third time, we need to alter our approach. Hence: ensure kernel dbus does not touch the core kernel, does not require any socket families/functionality registered in the kernel proper, but can be a kmod that only consumes but never provides/alters existing kernel interfaces
Suggested design: the new kernel-based D-bus will be built on top of
kernel character device nodes:
Control devices shall understand the following ioctls:
DBUS_CMD_BUS_CREATE → creates a new bus, and an initial entry point device node for it. This will create the first entry point device node and it will be owned by the invoking uid/gid
DBUS_CMD_BUS_REMOVE → removes a previously created bus, tears down all of the namespaces and endpoints associated with that bus. The “Master” bus can not be removed.
DBUS_CMD_NS_CREATE → creates a new namespace for usage in sandboxes. This will result in a new master device node being created in a subdirectory of /dev/dbus.
DBUS_CMD_NS_REMOVE → removes a previously created namespace. All endpoints created for that namespace will be removed. The “Master” namespace can not be removed.
Bus EP devices shall understand the following ioctls:
DBUS_CMD_EP_CREATE → creates a new EP for an existing bus
DBUS_CMD_EP_REMOVE → removes a bus entry point. If all EPs of a bus are gone the bus itself is removed too.
DBUS_CMD_EP_POLICY_SET → install a new access policy into this EP. Once an access policy is set it cannot be changed. Policies are simple per-service access lists.
DBUS_CMD_MSG__SEND → Sends a previously allocated message to a bus. This call takes a flag to optionally free the allocated message, i.e. imply a DBUS_FREE
DBUS_CMD_MSG_RECV → Receives a message. This returns a valid pointer to user memory. Ideally, this refers to a COW copy of the message in the senders memory
DBUS_CMD_NAME_ACQUIRE → Acquires a well-known service name for the open bus fd
DBUS_CMD_NAME_RELEASE → Releases a well-known service name for the open bus fd
DBUS_CMD_NAME_LIST → Returns a list of all currently registered unique and well-known names
DBUS_CMD_MATCH_ADD → Adds a filter for non-directed messages to the open bus fd. By default a service will not receive any non-directed (i.e. broadcast) messages, such as signals.
DBUS_CMD_MATCH_REMOVE → Inverse of DBUS_ADD_MATCH
Bus activation is done in socket-activation style: systemd opens the bus device, allocates the service name and then passes on this fd to activated services on activation.
All messages implicitly carry UID/PID/GID, timestamp of the sender (possibly more, such as audit info).
System access policy is always installed on the client side, merging configuration from /etc and per-sandbox configuration.
Here’s an example of the /dev/kdbus/ layout:
| |-- bus
| |-- ep-epiphany
| `-- ep-firefox
| `-- bus
| `-- bus
| |-- control
| |-- system
| | `-- bus
| `-- 1000-user
| `-- bus
Applications should always run in sandboxes of minimal privilege. Part of that is that even though a word processor should be capable of opening arbitrary files the user picks, it should not get access and see all files without user intervention. To handle this problem we’d like to see a system like Android’s “Intents” to be adopted in GNOME. The idea is basically, that apps no longer implement operations such as “pick file”, “take photo” directly in their apps, but rather leave this to a “portal” provider, which lives outside of the sandbox, and runs with different privileges, and requires user interaction. e.g.: if the user clicks “Open” in the word processor, it should simply tell the system that the user wants to open a file. The system would then show a file selection UI, allow the user to pick a file and then return the file contents. At no time the app should be allowed to directly search for the file or do so without user interaction involved. Portals are primarily a security feature (since they are basically a security domain transition), but double as integration point for the OS.
Suggestion would be to simply define a D-Bus service interface that allows registration of “portal” handlers via bus names, and provides not much more than a single method that executes the desired operation and returns the data as payload.
The Search Path Problem
When an app image is mounted (see above) its contents needs to be made available to the host. Examples: the .desktop file and the app icons need to be discovered by gnome-shell. Bus activation files need to be discovered by D-Bus. dconf schemata need to be found by dconf. Documentation should be discovered by the help browser. And so on. Extension packages need to be discoverable by the apps they extend: i.e. firefox needs to be able to find the newly installed flash plugin, gstreamer the newly installed MP3 codec, gvfs the new network file system.
Currently, various different implementations and specifications for a file search logic are established. Some packages only watch a single fixed directory for drop-ins, others honour an env var (such as $PATH), even others implement XDG basedir or something like it. But all these implementations suck in many ways: no common scheme is followed, frequently do not allow live changes, or require that all resources reside in the same dir, so that clean separation of OS and app data is diluted.
We believe to make apps feel natural and at home we need to clean the Search Path logic. Our idea is to implement a library that extends XDG basedir:
Fixing the search path issue is important in the apps context, but much further beyond that, as a lot of server software suffers by the same issues. Think Apache or PHP modules and so on. At the moment systems such as Red Hat “Stacks” try to work around these issues by patching env vars, symlinking stuff from /opt into /usr, and similar ugly things. We believe fixing the search path issue properly, so that env var patching or symlinking things into /usr is unnecessary, is highly desirable across the whole stack.