Interacting with dbus-daemon's "app container" feature

jamesh · December 19, 2017, 10:13am

I’ve been working to upstream the snap support for xdg-desktop-portal and while they are receptive to accepting the feature, they asked how it would fit in with the new D-Bus “app container” feature:

https://bugs.freedesktop.org/show_bug.cgi?id=100344

This is essentially an attempt to provide a confinement system independent way of labelling connections as originating from a container, and to impose some restrictions on those connections. It also provides a new API for reading the connection labels, that can then be used by D-Bus services that act as “trusted helpers” when talking to confined applications.

Even if we continue to rely on our AppArmor patches for D-Bus mediation, I think it is worth investigating whether we can plug into this system for the benefit of using trusted helpers like xdg-desktop-portal with fewer snap-specific code paths. While some parts of the feature exist in dbus’s master branch, it is still in flux so now is probably the time to look at this.

Changes here will obviously have security implications (so @jdstrand may be interested), and will affect how the sandbox mount namespace is constructed (so @zyga-snapd may be interested). To handle the session/user bus, it probably also depends on user mounts. Below is my understanding of how this feature works in its current form.

Separate sockets for confined apps

Rather than having confined apps connect to the main dbus sockets, the idea is to create a new listening socket for each sandbox. Any connections accepted by this socket will be labelled as belonging to that “app container”.

For the system bus, this probably means mounting a tmpfs over /run/dbus in the sandbox and creating a new socket at /run/dbus/system_bus_socket. Alternatively, we could set $DBUS_SYSTEM_BUS_ADDRESS and use a different location.

For the session bus, it is generally found at $XDG_RUNTIME_DIR/bus, so it will depend on how we handle $XDG_RUNTIME_DIR in the future. This might be something that is easier if we stop altering the value of $XDG_RUNTIME_DIR inside the sandbox and bind mount over the normal contents.

Contained app metadata

When registering an extra listening socket, the confinement system provides some additional information:

container_type: a string representing the confinement system (e.g. io.snapcraft.snapd).
app_identifier: a string identifier representing how the confinement system identifies the app (e.g. the snap package name).
metadata: an a{sv} dictionary of additional metadata.

This information will be associated with any connections made on the new listening socket, and made available to services acting as trusted helpers.

Lifetime of app container sockets

New app container sockets are created by the confinement system and then passed to dbus-daemon through the AddContainerServer D-Bus method call. Of course, dbus-daemon then needs to know when it can close that socket and stop accepting new connections. In the current design, this is handled by passing an extra close_notification file descriptor: this would generally be the read end of a pipe, with dbus-daemon closing the listening socket when it detects the write end of the pipe has been closed.

This particular design means there needs to be some process to hold on to the file descriptor and ensure it is closed at the right time. For system bus sockets, snapd may be in the best position to handle this. For session bus sockets, this is going to tie into how we handle lifecycle of the user mount namespaces. Perhaps a daemon within the user session is the right choice here: maybe even snap userd?

Access control

When registering a new container socket, the confinement system provides a set of access control rules dictating what connections made to this socket are allowed to do. These rules are checked in addition to the LSM checks, so we could install a broad access control list and continue relying on our existing AppArmor mediation.

Alternatively we could look at moving some of our D-Bus access control over to this system, which would benefit snaps running on host systems that haven’t adopted AppArmor or the dbus-daemon mediation support. If we are interested in using this system, then we should make sure if can handle everything we care about. From a read of the bug report, the main missing features I can see at this point are:

no way to change the ACL of a container socket after creation. We would need this to support snap connect/snap disconnect on an interface that changes D-Bus rules.
no way to specify rules that depend on the peer’s app_identifier. We currently generate AppArmor rules that can distinguish e.g. NetworkManager running on the host system vs. NetworkManager running as a snap, for instance.
can only allow method calls at the (bus_name, object_path, interface) granularity. In some of our existing snap interfaces we grant access to specific methods in an interface, but it would be worth deciding whether that is actually necessary.

smcv · December 19, 2017, 1:24pm

Please see the various bugs that are marked as blockers of #100344 for more detailed design around individual sub-features.

If you want to ask applications to connect to a specific socket, please set DBUS_SESSION_BUS_ADDRESS in their environment (and if you want to make sure they only connect to that socket even if they are malicious or compromised, use technical measures to stop them from connecting elsewhere). XDG_RUNTIME_DIR is only used if DBUS_SESSION_BUS_ADDRESS isn’t set.

Rather than having confined apps connect to the main dbus sockets

If you are relying on the extra app container sockets for identification, then you must not allow sandboxed (confined) apps to connect to the main session and system bus sockets at all; any connection to those sockets is treated as un-sandboxed. (Obviously you could use AppArmor to confine such connections differently, if you have a sufficiently patched kernel; but that’s hard to rely on, because many distributions use an LSM that conflicts with AppArmor, or no LSM at all.)

In the current design, this is handled by passing an extra close_notification file descriptor

You can also use a D-Bus method call, if you happen to have a process that is still alive, connected to the appropriate bus (system or session), and not technically constrained not to use D-Bus. So far, the D-Bus method call has been implemented, but the tricks with pipes haven’t.

At the moment, the extra socket also stops listening when whichever connection created the extra socket disconnects from the relevant bus, but in future there’s going to be a way to ask for that to not happen.

The tricks with pipes are because Flatpak’s “supervisor” process is bwrap, which doesn’t do D-Bus, and is sometimes setuid root (so should really avoid linking any non-critical libraries at all); by the time the application runs, flatpak run has already removed itself from memory by exec’ing bwrap, so it’s no longer able to do new D-Bus calls.

no way to change the ACL of a container socket after creation

Is https://bugs.freedesktop.org/show_bug.cgi?id=101902 enough? Or if the GrantAccess and RevokeAccess methods made the unique name explicit rather than implicit (allowing unconfined processes to grant access to services that are not themselves), would that be enough?

More generally, why do you need this, and what functionality do you need from it? It could be added if there’s a good enough reason.

no way to specify rules that depend on the peer’s app_identifier. We currently generate AppArmor rules that can distinguish e.g. NetworkManager running on the host system vs. NetworkManager running as a snap, for instance.

Why do you need this? If NetworkManager-as-platform-service and NetworkManager-as-snap own the same bus name, then as far as D-Bus is concerned, they are the same (and in particular any client of NetworkManager that is trusting it to behave correctly is equally exposed to NM’s ability to behave incorrectly or maliciously either way).

can only allow method calls at the (bus_name, object_path, interface) granularity. In some of our existing snap interfaces we grant access to specific methods in an interface, but it would be worth deciding whether that is actually necessary.

I think this is really a design flaw in those interfaces, but it wouldn’t be difficult to add a flag that gives a rule per-method granularity (https://bugs.freedesktop.org/show_bug.cgi?id=101902#c5), and it turns out we need that for feature parity with flatpak-dbus-proxy anyway.

jdstrand · December 19, 2017, 1:51pm

In terms of the bind mounts for the sockets and XDG_RUNTIME_DIR, this all sounds fine and is where I hope we go, as mentioned elsewhere.

For my next comments, I am going to talk at a relatively high design level rather than deep implementation details. AIUI, (currently with flatpak-dbus-proxy) they are implementing a proxy where applications must talk to the proxy and the proxy mediates access to the system and session buses. This is the classic ‘trusted helpers’ approach we use on Touch (eg, connectivity-api), the difference being they utilize bind mounts for sockets. Now with dbus-daemon “app container” features, container registration is used through these sockets so the trusted helper (now, dbus-daemon itself) can identify processes and connections.

One way to implement this would be simply to create/modify an interface for this proxy. The connected plug policy for apparmor allows the snap to connect to and use the entire api of the proxy. The slot policy allows the proxy to talk to connected snaps and whatever system and session bus APIs the proxy knows about (or everything, if we trust it unconditionally). Then the proxy mediates snap access in whatever manner it sees fit and separate from snapd. We can make this a classic implicit slot if desired. This method would work for systems with or without LSM support. We’d need only snap the proxy or make it available in the classic distro via a snapd Depends (or similar). This method is straightforward and might be a good first step to enable the feature. Whether it is the final step in part depends on how dynamic the proxy’s policy is meant to be (ie, should it really be tied to snapd interface connections?).

Your alternative idea of creating an interface backend for the proxy and encoding the the proxy’s (non-AppArmor) mediation rules in snapd is possible. snapd could reimplement the proxy itself or Depends on the proxy (the choice here would in part depend on if this is for classic distro only or for Ubuntu Core too). I think we’d use the same coarse-grained LSM controls for talking to the proxy at all as described above, but then we create interfaces as needed for snap connections/disconnections as appropriate. Eg (prefixing proxy- for clarity of discussion), we create a proxy-printing interface that uses the apparmor backend to to create a rule to talk to the proxy at all and uses the desktop-portal backend to add proxy rules for configuring the proxy to allow using the printing portal.

UPDATE: in both examples, the ‘proxy’ can be either the current ‘flatpak-dbus-proxy’ or the new ‘dbus-daemon with “app container” patches’, but see comments later for clarification.

jdstrand · December 19, 2017, 1:58pm

Historically we’ve used member mediation for DBus services that have a DBus that has been written with mediation in mind where it might have a few methods that are safe for anyone to use and some that are privileged (potentially otherwise allowed by the session due to polkit rules for this particular user). This works fine for something like locationd where it was written with APIs for determining location (the location-observe interface) and APIs for configuring locationd (the location-control interface). Many DBus services were not written with this in mind and therefore do not use member mediation in the interface (eg, network-manager, ofono or bluez).

jdstrand · December 19, 2017, 2:15pm

Responding to myself:

Note I was speaking with language that uses the current concept of the flatpak-dbus-proxy in my previous response, which is how snapd, today, could support xdg portals, etc.

In terms of a future where dbus-daemon has the “app container” patchset (sidestepping how to make that available to older classic distro releases), things aren’t very different. We use LSM mediation to enforce talking only to the bind mounted socket locations for system and session and then dbus-daemon can use its ACLs. With this dbus-daemon approach, I think it makes a lot of sense to have an interface backend that encodes the dbus ACLs for snapd connect/disconnect.

jdstrand · December 19, 2017, 2:28pm

Note that LSM stacking changes things a bit here.

Also, it is entirely within snapd’s control to bind mount over the main system and session buses and only allow access to the ACL-mediated dbus-daemon buses, which I think is what @jamesh was getting at. It then becomes an implementation detail within snapd where snapd could decide what to do based on whether or not an LSM is available. Whether or not we want to do it this way would need discussion of course.

jdstrand · December 19, 2017, 2:42pm

Keep in mind, on systems with AppArmor enabled, every snap command gets its own security label (eg, snap.app.command). Currently, we use the peer label on both sides of the connection. Eg, The network-manager slot implementation has rules with peer=(label=snap.some.plugging-app and the plugging snap has rules with peer=(label=snap.network-manager.network-manager). We do this to enforce that slots may only communicate with connected clients and plugs can only communicate with connected servers. It’s of course true that only one process could claim the name. We could drop the peer rule, but having it adds a bit of hardening and removes ambiguity, making it very clear in the security policy what snaps are allowed to communicate.

On classic distro where network-manager is not a snap but part of the classic system, the plugging snap will have rules with peer=(label=unconfined). This is of course not as specific as with a slotting snap, but makes sure that the plugging snap is at least only allowed to communicate with something that is running trusted on the system.

smcv · December 19, 2017, 2:57pm

Sure. The part that I think potentially indicates a design flaw is that you’re finding you need to distinguish at the level of individual methods, rather than having com.example.Location.Observe and com.example.Location.Control interfaces.

smcv · December 19, 2017, 3:02pm

It isn’t clear to me how/whether dbus-daemon AppArmor and SELinux identification and mediation would work in a stacked-LSM world. At the moment, the result of the SO_PEERSEC getsockopt (which dbus-daemon exposes as the LinuxSecurityLabel) is an opaque LSM-specific token. In a kernel with LSM stacking, SO_PEERSEC has to return something, but it’s not clear what it would/could return: for compatibility with SELinux it has to return the SELinux context, for compatibility with AppArmor it has to return the AppArmor context, and they can’t both win! The specification for LinuxSecurityLabel says it is literally the result of SO_PEERSEC, so whatever the kernel gives me (whether that turns out to be something useful or not), that’s what you’ll get.

jdstrand · December 19, 2017, 3:19pm

Sure. We absolutely advocate for writing clean APIs in new services that would do this, but existing services often don’t do that, so we do the best we can. Future services developed with application confinement in mind shouldn’t need member mediation.

jdstrand · December 19, 2017, 3:26pm

I don’t have the specific details here, but my understanding is that there is the concept of a default LSM and APIs for diving deeper. A system is booted with the default LSM set and so an unmodified application will only see the default LSM for the label. An application that is aware of LSM stacking can, for example, check if stacking is enabled and then use APIs to then make decisions. AIUI, AppArmor plans to update its libapparmor APIs to make things as friendly as possible, so something like dbus-daemon could on a system with SELinux as the default LSM could call into it to get the stacked label and/or the AppArmor label, and go from there. @tyhicks could probably comment further if needed.

smcv · December 19, 2017, 4:22pm

OK, but is that more like a desired property that you rely on, or an undesired property that you are working around by adding different rules in each case? In an ideal world, would you be using label=something-that-uniquely-identifies-network-manager in both cases?

Given that an AppArmor system with D-Bus mediation is trusting the dbus-daemon to perform mediation for it anyway, it would seem equally robust to have

AppArmor:
- unconfined processes may own org.freedesktop.NetworkManager
- snap.network-manager.network-manager may own org.freedesktop.NetworkManager
- other labels cannot
dbus-daemon system.d:
- uid 0 may own org.freedesktop.NetworkManager
- other uids cannot
Containers1:
- snap.some.plugging-app may talk to whoever owns org.freedesktop.NetworkManager

(In AppArmor, dbus send peer=(name=org.freedesktop.NetworkManager) probably only allows messages that literally have org.freedesktop.NetworkManager in their DESTINATION field, because that’s what was easy to implement, and the structure of the communication between the dbus-daemon and AppArmor makes it hard for the dbus-daemon to make “what if?” queries; that often makes it hard to write good policies for existing code. But in Containers1 I plan to implement the semantics of <allow send_destination>, so it’s about connections, not header fields: sending messages whose DESTINATION is either org.freedesktop.NetworkManager, NM’s unique name, or even another name owned by NM is allowed by a rule that only refers to org.freedesktop.NetworkManager.)

jamesh · December 20, 2017, 10:28am

The use case here is to update the ACL when snap connect or snap disconnect is invoked with a snap interface that include D-Bus access rules. We’d want to change the ACL on an existing container socket. Reading that bug report, it sounds like the GrantAccess / RevokeAccess are intended to be used by a D-Bus service to install it’s own access rules, rather than manipulating a container’s ACL.

I think we’d want something like the following:

o.fd.DBus.Containers.SetContainerAccess(s: container_id,
                                        a(usos): rules)

This would replace the ACL associated with the given container ID. I’m not sure exactly who should be allowed to call a method like this. Any non-contained D-Bus connection?

I think there are two reasons we’ve got rules using this feature:

If you’re establishing a connection between two snaps, generating rules in terms of the security labels of those two snaps is the simplest solution.
We’re working within the limitations of the AppArmor/LSM interface. Method calls to NetworkManager could have two possible values in the destination field: either the o.fd.NetworkManager well known name, or the unique name of the current instance of the daemon. The LSM isn’t given a full list of all names owned by a given unique name, so rules written in terms of well known names are unreliable.

It sounds like (2) is not an issue for your new system, which changes the dynamics a bit.

niemeyer · December 21, 2017, 12:48am

@jamesh Your description covers the issue well and I agree on pretty much all points.

One thing to keep in mind is that it sounds like you are mainly thinking about this in terms of a replacement of the AppArmor support. In practice we’ll more likely complement it instead, adding rules that match the intention of each interface in the best way possible according to what is offered by the new backend.

In that sense, the main blocker seems to be this:

We can’t write a new backend if we cannot perform changes at all after the application is running, since we don’t really control the life time of the application. That is, after installed it can run at any time, and we need to be able to adapt its access as the user requests.

Once we can do that, we can then work through the other issues mentioned:

Indeed, not having this ability would greatly limit the interfaces we can use it for. In effect we’d only be able to make use of the new backend in interfaces that cover system-level resources, but not those exposed by other applications since we’d not be able to address the particular application.

Indeed, that’ll also greatly limit how useful the new backend will be. I understand good interfaces wouldn’t require that capability, but as a mediation platform we need to be able to address real applications as they exist today.

Then, about this aspect:

We don’t want to tie the lifetime of snapd or userd to the application lifetime. We need to find a way to bind the lifetime of the socket to the lifetime of the application itself, similar to how we’re working to handle the mount namespaces. It’s unfortnuate that we’ll need an actual process for that, though. We’ll need to discuss this.

jamesh · December 21, 2017, 2:14am

I’ve been looking at this from two perspectives:

it is likely that we’ll see more D-Bus services making decisions based on the app container concept in the future (portals and dconf-service being the most interesting examples). What will we need to do to have snap confined apps interact with them?
The app container feature is still in flux. What changes would make it more useful to snapd?

For (1), I think we can mostly ignore the ACL feature. If we continue to rely on AppArmor mediation, we can just install a permissive ACL and there will be no reason to change it.

I think (2) is interesting for two reasons. Firstly, it could improve our confinement story on distros that aren’t running AppArmor. Secondly, it can do some things that our AppArmor rules can’t (the main one being to specify rules in terms of well known bus names), so might be useful to us in future. So I think it is interesting to look at what would prevent us from using the feature in its current form.

jamesh · December 21, 2017, 4:07am

This is one of the areas @smcv mentioned he was open to changing. He also mentioned the possibility of tying the lifetime of an app container socket to a connection to the bus, but that is really just switching one file descriptor for another. To completely separate the lifetime of the app container socket from whatever entity is responsible for the mount namespace, we’d probably want a RemoveContainerServer method that takes the container ID returned by AddContainerServer.

I can see the appeal of using a pipe file descriptor for this, since it guarantees clean up when the process on the other end exits, even if that exit is unclean. It fits into Flatpak’s model, where there their sandbox uses a separate pid namespace with a supervisor process acting as pid 1 for that namespace.

Things are a little more complicated for us, since we’re sharing the mount namespace between multiple instances of an app (or multiple apps from a single snap).

If we don’t go with a long running session service to help manage some of this, perhaps we can achieve some of it with a cleanup task in the session? Given the set of Ubuntu releases we care about, we’d need something that could work with both Upstart and Systemd user sessions. I’m not sure what it’d mean for other distros.

smcv · December 21, 2017, 12:03pm

You mean like https://cgit.freedesktop.org/dbus/dbus/commit/?id=69d164cbd38043359b9ee91255434939792ee4f6 which already exists? This would be suitable for use by a long-running session service.

(Please take a look at the minimum-possible implementation that was already merged, and the bugs that block #100344 and are my plan for the rest of the feature - I’m happy to discuss design improvements for the feature, but this is something that I can only work on part-time, so I’d rather spend that time on implementation and design improvements than on recapping the design that has already happened.)

At the moment https://cgit.freedesktop.org/dbus/dbus/commit/?id=3048c90ccbeab6ed807ac80662e7196520ba1ab6 means that it isn’t possible to keep a container server alive longer than the D-Bus connection that initiated it, so you need a long-running process (like Flatpak’s flatpak-session-helper) or at least a process that talks D-Bus and lasts at least as long as the app (like Flatpak’s flatpak-dbus-proxy, which is what does the Containers1 D-Bus calls in my first prototype of gluing this onto Flatpak). I do plan to relax that, so that flatpak-dbus-proxy can eventually disappear; but we need something, either the initiator of the container server or some helper process, to take active responsibility for cleaning up stale container servers.

jdstrand · December 21, 2017, 9:59pm

For ‘1’, yes.

For ‘2’, I’m personally not that excited about this as a confinement improvement on distros without AppArmor, cause there are so many other holes that would still exist (please note my previous comment on LSM stacking which will be the way to make full strict snap confinement available everywhere).

I do think being able to specify rules in terms of well-known name would be nice (we already can with AppArmor when the connection is well-known, it’s just as you said the unique names are problematic. Most of the time this isn’t an issue cause we can use the security label on both sides and also AppArmor will implicitly allow replies to sent messages, but occasionally it would be nice to have the well known name when today we only have the unique).

Personally I think it will be most interesting for services and clients are coded to use the “app container” concept and the (fine-grained) security policy would be best expressed with this. Ultimately as the feature matures we can do as @niemeyer suggests and complement the AppArmor policy with “app container” policy as appropriate.

niemeyer · December 22, 2017, 12:32pm

@smcv Knowing that there is a process registered and cleaning it up would be much less of an issue for us than having to keep something alive (and that can’t stop/restart) that knows about all possible instances of that application. We already do quite a bit of book keeping and cleaning up due to all the resources we need the application to be using, so this would be just one more of those and it’d be fine to be in charge of releasing resources inside DBus when necessary.