A "user session agent" for snapd


#1

One of the objections to adding support for user session systemd daemons and D-Bus services was the lack of control over package upgrades or removal. While system level daemons can be stopped and started by snapd, this isn’t the case for user session daemons: as snapd is running as root, it can not talk to the user session instance of systemd. Even if snapd tried to connect to the user’s D-Bus session bus, the connection will be refused due to mismatched user ID.

At the last engineering sprint earlier this year, we brainstormed some ideas for how we could solve this problem. I don’t think any work has been done on it since, and a concrete plan wasn’t written down. So here is my recollections, as a first step towards an implementation.

A “session agent” for snapd

Rather than have the root owned snapd try to poke around inside the user’s session, we would instead have an agent running as the user that could act on snapd’s behalf. Preferably this agent wouldn’t need to run constantly through out the session for the following reasons:

  1. If we can recover from the agent stopping, then crashes are far less critical.
  2. If snapd is upgraded mid-session, the agent can also be upgraded.

Systemd socket activation seems the best solution to this problem. It provides a reliable end-point for snapd to communicate with, and starts the agent on demand. If the service exits when idle (or when told to via an API), an upgraded version of the service will respond to the next API call.

The socket unit could look something like this:

[Unit]
Description=Socket activation for snap session agent

[Socket]
ListenStream=%t/snap-session.socket

[Install]
WantedBy=sockets.target

This will expand to /run/user/$uid/snap-session.socket, so snapd can easily enumerate the available session agents with a simple glob.

The agent could either be a new process, or implemented as new functionality of snap userd. If we make userd’s D-Bus service activation files use the SystemdService option, we should be able to have it activate by either D-Bus or the unix socket.

What protocol should the agent speak?

At the sprint, @pedronis suggested using an HTTP/REST API similar to the system level snapd socket. This seems like a reasonable option. We’ve already got code in place to do SO_PEERCRED checks to verify that the agent is talking to the root account, for instance.

One thing to keep in mind is that the session agent is untrusted code, from the perspective of snapd. While snapd should be talking to code we’ve written, there is nothing stopping me from killing the agent and writing my program listening on that socket. With this in mind, HTTP is a reasonable choice since most attacks related to a misbehaving server also apply to using HTTP to speak to random servers on the Internet.

We’d need to make sure any API calls use reasonable timeouts and response size limits.

What API should be offered?

We should be able to verify that the agent is working correctly with a simple “status” or “verson” API. But as for real world uses, I imagine we’d probably want:

  1. start and stop named user mode systemd units.
  2. perform a daemon-reload on the user instance of systemd.
  3. post a notification asking the user to close an application (e.g. the user has Skype running but minimised, and an update has come through).

I’d appreciate any feedback on this (e.g. from @zyga or @pedronis), so we can move on to implementation.


#2

@jamesh thanks for writing this up and looking into this. What’s written here corresponds to the recollection I have of the conversation in Malta. The general plan looks fine.

About 3. we might not need it very soon, but there will be work in that area over the cycle.

@chipaca should be able to help you/be a reference for you in this area, especially once he has finished some left over work. I will of course keep an eye on this and do reviews as needed.


#3

I’ve been working on an initial implementation of the session agent here:

While I started by integrating this code into snap userd, it looks like this causes problems on Xenial systems. As Ubuntu was part way through the Upstart to Systemd transition, the user session is a bit unusual.

While we do have a user instance of Systemd capable of supporting socket activation, the D-Bus session bus is instead managed by a user instance of Upstart. Further more, the session bus address has not been shared to the systemd environment, so systemd user units can not connect to that bus. That means we can’t easily have userd’s D-Bus services and socket activatable REST service combined into one process.

This also means that we will likely won’t be able to have the REST service post desktop notifications on Xenial. Given Xenial’s dropping desktop market share, perhaps it is acceptable for notifications to not be present there.


#4

And after splitting this into its own process, things are working on Xenial. It also seems to work on Ubuntu Core 16, which is a nice bonus.

It fails on Core 18, but I suspect that could be fixed with a few extra symlinks in the core18 snap. I’m not sure how much importance to place on this though, given that no core devices are running a user session at present.


#5

And now I’ve got the tests passing on Ubuntu Core 18 too. It didn’t require any changes to the core18 snap after all: I just followed the pattern used to install the main system units on these systems.


#6

With the basics working in my PR, I started thinking about the types of actions the session agent should perform. As mentioned in the original post, I suspect the primary ones will be:

  • start or stop user systemd units.
  • tell the user systemd instance to refresh its config
  • post desktop notifications

Controlling user systemd instance

We already have an interface for controlling systemd in the form of github.com/snapcore/snapd/systemd, which issues appropriate systemctl commands. It is fairly simple to extend this to issue systemctl --user commands.

Desktop Notifications

There are two standards for Linux desktop notifications in use today:

These are supported on various desktops as:

  • GNOME: both GTK and FDO
  • KDE: FDO
  • MATE: FDO
  • XFCE: FDO
  • Unity 7: FDO, with no support for actions

Note: the fact that Unity 7 has no support for actions means all use of notifications should assume the user may ignore or not see the notification.

While the FDO standard covers everything, it may still be worth supporting GTK notifications. It is a better fit for a background service that exits on idle, and should give better integration on modern GNOME desktops.

Both standards rely on D-Bus, with the GTK standard also requiring the app posting notifications hold an activatable D-Bus well known name. On all modern systems this is not a problem. I believe we can have this work on Xenial systems: since Unity 7 only supports the FDO standard, it doesn’t matter that we can’t perform bus activation of the session agent on that distro.


#7

Can’t we think of a small fix to SRU or workaround specific to Xenial that doesn’t involve this process splitting? It’s kind of problematic to start this whole new area with a suboptimal design dictated by the n-1 LTS.


#8

So, I guess there’s a few points of note here:

  • The session agent has a different audience to userd: one performs actions on behalf of snapd, while the other performs actions on behalf of confined applications. It’s not immediately obvious that there will be overlap here, and they each have different security concerns.
  • If these are exit on idle processes, it isn’t obvious that they would be running simultaneously very often.
  • Having two processes now does not preclude having one process in the future. Clients are either accessing a D-Bus service or a unix domain socket HTTP server. What’s on the other end of that connection can change in the future. When Xenial reaches EOL, we could change how things are wired up.

As far as modifying Xenial to handle dbus/systemd integration at the user session level, I tried the following on a VM:

  1. installed the dbus-user-session package, which adds the user level dbus.service and dbus.socket systemd services.
  2. Rebooted, and noticed there were two session buses running: one run by systemd and one by Upstart. In a shell, $DBUS_SESSION_BUS_ADDRESS pointed at the Upstart instance.
  3. systemctl --user show-environment was now being populated with the session environment (e.g. $DISPLAY). It’s version of $DBUS_SESSION_BUS_ADDRESS pointed at the systemd version though.

We can’t get rid of the dbus upstart job, since it is referenced by other jobs (and potentially third party packages targeting Xenial that provide their own Upstart jobs). It may be possible to modify the job to essentially do systemctl --user start dbus.socket, then copy $DBUS_SESSION_BUS_ADDRESS into the Upstart environment. Combine that with dependency updates to ensure dbus-user-session is installed, and you might have something that works.

I don’t like the chances of getting that SRU’d though. It is a pretty invasive change to the critical path of starting the desktop, and it is hard to tell what other side effects there might be.