A "user session agent" for snapd

jamesh · May 20, 2019, 11:12am

One of the objections to adding support for user session systemd daemons and D-Bus services was the lack of control over package upgrades or removal. While system level daemons can be stopped and started by snapd, this isn’t the case for user session daemons: as snapd is running as root, it can not talk to the user session instance of systemd. Even if snapd tried to connect to the user’s D-Bus session bus, the connection will be refused due to mismatched user ID.

At the last engineering sprint earlier this year, we brainstormed some ideas for how we could solve this problem. I don’t think any work has been done on it since, and a concrete plan wasn’t written down. So here is my recollections, as a first step towards an implementation.

A “session agent” for snapd

Rather than have the root owned snapd try to poke around inside the user’s session, we would instead have an agent running as the user that could act on snapd’s behalf. Preferably this agent wouldn’t need to run constantly through out the session for the following reasons:

If we can recover from the agent stopping, then crashes are far less critical.
If snapd is upgraded mid-session, the agent can also be upgraded.

Systemd socket activation seems the best solution to this problem. It provides a reliable end-point for snapd to communicate with, and starts the agent on demand. If the service exits when idle (or when told to via an API), an upgraded version of the service will respond to the next API call.

The socket unit could look something like this:

[Unit]
Description=Socket activation for snap session agent

[Socket]
ListenStream=%t/snap-session.socket

[Install]
WantedBy=sockets.target

This will expand to /run/user/$uid/snap-session.socket, so snapd can easily enumerate the available session agents with a simple glob.

The agent could either be a new process, or implemented as new functionality of snap userd. If we make userd’s D-Bus service activation files use the SystemdService option, we should be able to have it activate by either D-Bus or the unix socket.

What protocol should the agent speak?

At the sprint, @pedronis suggested using an HTTP/REST API similar to the system level snapd socket. This seems like a reasonable option. We’ve already got code in place to do SO_PEERCRED checks to verify that the agent is talking to the root account, for instance.

One thing to keep in mind is that the session agent is untrusted code, from the perspective of snapd. While snapd should be talking to code we’ve written, there is nothing stopping me from killing the agent and writing my program listening on that socket. With this in mind, HTTP is a reasonable choice since most attacks related to a misbehaving server also apply to using HTTP to speak to random servers on the Internet.

We’d need to make sure any API calls use reasonable timeouts and response size limits.

What API should be offered?

We should be able to verify that the agent is working correctly with a simple “status” or “verson” API. But as for real world uses, I imagine we’d probably want:

start and stop named user mode systemd units.
perform a daemon-reload on the user instance of systemd.
post a notification asking the user to close an application (e.g. the user has Skype running but minimised, and an update has come through).

I’d appreciate any feedback on this (e.g. from @zyga-snapd or @pedronis), so we can move on to implementation.

pedronis · May 22, 2019, 9:52am

@jamesh thanks for writing this up and looking into this. What’s written here corresponds to the recollection I have of the conversation in Malta. The general plan looks fine.

About 3. we might not need it very soon, but there will be work in that area over the cycle.

@chipaca should be able to help you/be a reference for you in this area, especially once he has finished some left over work. I will of course keep an eye on this and do reviews as needed.

jamesh · June 6, 2019, 7:13am

I’ve been working on an initial implementation of the session agent here:

While I started by integrating this code into snap userd, it looks like this causes problems on Xenial systems. As Ubuntu was part way through the Upstart to Systemd transition, the user session is a bit unusual.

While we do have a user instance of Systemd capable of supporting socket activation, the D-Bus session bus is instead managed by a user instance of Upstart. Further more, the session bus address has not been shared to the systemd environment, so systemd user units can not connect to that bus. That means we can’t easily have userd’s D-Bus services and socket activatable REST service combined into one process.

This also means that we will likely won’t be able to have the REST service post desktop notifications on Xenial. Given Xenial’s dropping desktop market share, perhaps it is acceptable for notifications to not be present there.

jamesh · June 7, 2019, 6:42am

And after splitting this into its own process, things are working on Xenial. It also seems to work on Ubuntu Core 16, which is a nice bonus.

It fails on Core 18, but I suspect that could be fixed with a few extra symlinks in the core18 snap. I’m not sure how much importance to place on this though, given that no core devices are running a user session at present.

jamesh · June 10, 2019, 4:16am

And now I’ve got the tests passing on Ubuntu Core 18 too. It didn’t require any changes to the core18 snap after all: I just followed the pattern used to install the main system units on these systems.

jamesh · June 14, 2019, 1:40am

With the basics working in my PR, I started thinking about the types of actions the session agent should perform. As mentioned in the original post, I suspect the primary ones will be:

start or stop user systemd units.
tell the user systemd instance to refresh its config
post desktop notifications

Controlling user systemd instance

We already have an interface for controlling systemd in the form of github.com/snapcore/snapd/systemd, which issues appropriate systemctl commands. It is fairly simple to extend this to issue systemctl --user commands.

Desktop Notifications

There are two standards for Linux desktop notifications in use today:

These are supported on various desktops as:

GNOME: both GTK and FDO
KDE: FDO
MATE: FDO
XFCE: FDO
Unity 7: FDO, with no support for actions

Note: the fact that Unity 7 has no support for actions means all use of notifications should assume the user may ignore or not see the notification.

While the FDO standard covers everything, it may still be worth supporting GTK notifications. It is a better fit for a background service that exits on idle, and should give better integration on modern GNOME desktops.

Both standards rely on D-Bus, with the GTK standard also requiring the app posting notifications hold an activatable D-Bus well known name. On all modern systems this is not a problem. I believe we can have this work on Xenial systems: since Unity 7 only supports the FDO standard, it doesn’t matter that we can’t perform bus activation of the session agent on that distro.

pedronis · June 14, 2019, 8:25am

Can’t we think of a small fix to SRU or workaround specific to Xenial that doesn’t involve this process splitting? It’s kind of problematic to start this whole new area with a suboptimal design dictated by the n-1 LTS.

jamesh · June 14, 2019, 3:06pm

So, I guess there’s a few points of note here:

The session agent has a different audience to userd: one performs actions on behalf of snapd, while the other performs actions on behalf of confined applications. It’s not immediately obvious that there will be overlap here, and they each have different security concerns.
If these are exit on idle processes, it isn’t obvious that they would be running simultaneously very often.
Having two processes now does not preclude having one process in the future. Clients are either accessing a D-Bus service or a unix domain socket HTTP server. What’s on the other end of that connection can change in the future. When Xenial reaches EOL, we could change how things are wired up.

As far as modifying Xenial to handle dbus/systemd integration at the user session level, I tried the following on a VM:

installed the dbus-user-session package, which adds the user level dbus.service and dbus.socket systemd services.
Rebooted, and noticed there were two session buses running: one run by systemd and one by Upstart. In a shell, $DBUS_SESSION_BUS_ADDRESS pointed at the Upstart instance.
systemctl --user show-environment was now being populated with the session environment (e.g. $DISPLAY). It’s version of $DBUS_SESSION_BUS_ADDRESS pointed at the systemd version though.

We can’t get rid of the dbus upstart job, since it is referenced by other jobs (and potentially third party packages targeting Xenial that provide their own Upstart jobs). It may be possible to modify the job to essentially do systemctl --user start dbus.socket, then copy $DBUS_SESSION_BUS_ADDRESS into the Upstart environment. Combine that with dependency updates to ensure dbus-user-session is installed, and you might have something that works.

I don’t like the chances of getting that SRU’d though. It is a pretty invasive change to the critical path of starting the desktop, and it is hard to tell what other side effects there might be.

jamesh · June 17, 2019, 11:22am

I had a chat with some other members on the desktop team, and changing how the session bus is launched on Xenial is something we would like to avoid. For example, here is one of the types of bugs that show up when installing dbus-user-session on 16.04:

Bug #1689825 “gnome-keyring not unlocked on xenial when dbus-user-session is installed”

It’s worth remembering that we were in the process of replacing Upstart at the time 16.04 was released. If this change was easy, we would have made it at the time since it would remove the need to support Upstart for 2 extra years.

pedronis · June 17, 2019, 11:53am

Sorry, I was thinking more the reverse, glueing from the upstart/dbus world into the systemd one for Xenial. What happens if we do something like “systemctl --user start snapd.userd” (we’ll need to pass/set some env var/properties as well I suspect with the session info), from the dbus service files on Xenial?

(this is what I was badly referring to here: https://github.com/snapcore/snapd/pull/6954#issuecomment-502087259 )

jamesh · June 19, 2019, 8:06am

So I’ve been doing a few experiments. On a clean Xenial install, I created a unit file /usr/lib/systemd/user/snapd.userd.service with the following content:

[Unit]
Name=snap userd
[Service]
Type=simple
ExecStart=/usr/bin/snap userd

And modified D-Bus service activation file to:

[D-BUS Service]
Name=io.snapcraft.Launcher
Exec=/home/james/snap-userd.sh
SystemdService=snapd.userd.service
AssumedAppArmorLabel=unconfined

That is: ask systemd to start the service on systems where the bus is managed by systemd, and run a shell script otherwise. The referenced shell script contained:

#!/bin/sh
set -e
systemctl --user import-environment
exec systemctl --user start snapd.userd.service

I was able to successfully D-Bus activate userd with this setup, and have userd present zenity dialogs and launch graphical applications (i.e. it was able to connect to the X server and integrate with the desktop). I also ended up with a few extra environment variables in the systemd environment that related to the D-Bus launch process, namely:

DBUS_STARTER_ADDRESS
DBUS_STARTER_BUS_TYPE
DBUS_DEBUG_OUTPUT

These are probably benign, but it is hard to tell. It also means we’re doing the environment import every time the service is activated rather than at session startup, which could potentially overwrite things set by the user. It’s probably

We do run into a problem when adding socket activation to the service though: if we’re activated due to a REST call rather than D-Bus, then the environment import won’t happen. We won’t know the D-Bus session bus address, X display, or any other environment applications started via xdg-open might need. And if userd started in a degraded mode rather than erroring out, not connecting to D-Bus, the snap-userd.sh script is going to be ineffective because the referenced systemd service is already running.

This would also break compatibility for any configurations where a systemd user instance is not available. This definitely includes Ubuntu 14.04, which might be an acceptable loss at this point (while it is covered by ESM, this doesn’t count desktop packages or snapd). We’ve also got a number of spread tests that spin up a session bus without systemd. I know CentOS 7 has a particularly old systemd, so I’m downloading a live image to check whether there is anything weird there.

jamesh · June 19, 2019, 9:22am

@pedronis asked me to add a followup about how we can do notifications on Xenial systems if we don’t have the full set of desktop environment variables imported into the systemd user instance.

As mentioned previously, we only need to be able to connect to the D-Bus session bus to perform notifications. After starting this topic, I noticed that the Upstart job starting the session bus writes the bus address to the file $XDG_RUNTIME_DIR/dbus-session. So we could add the equivalent of the following shell code to the session agent startup:

if [ -n "$DBUS_SESSION_BUS_ADDRESS" -a -f "$XDG_RUNTIME_DIR/dbus-session" ]; then
    export DBUS_SESSION_BUS_ADDRESS=$(cat "$XDG_RUNTIME_DIR/dbus-session")
fi

As the FDO Notifications spec does not involve method calls to the application posting notifications, there is no requirement to be bus activatable.

I had also mentioned previously that Unity 7’s notification UI was designed to avoid user interaction. It turns out that if you post notifications with actions attached, it will present them as a dialog box. So I think we can rely on being able to post notifications with actions.

As for CentOS 7, from what I can see of the live CD image there is no systemd user instance at all. This places it in the same boat as Ubuntu 14.04 and not support a the session agent REST interface. Given that the distro probably sees more use as a server operating system and CentOS 8 is around the corner, perhaps this is acceptable.

pedronis · June 21, 2019, 11:31am

If I see it correctly on Bionic the systemd --user session shared environment already has DBUS_SESSION_BUS_ADDRESS set, correct ?

As I discussed with @jamesh I think we can go forward with the two processes approach as long as:

we can keep a clear conceptual separation of what their responsibilities are
we don’t have uncovered use cases, for example notifying/possibly querying the user in-session

pedronis · June 21, 2019, 11:33am

@jamesh thank you for bearing with me and for the investigations

jamesh · June 21, 2019, 1:43pm

Yes. On modern systems, the /usr/lib/systemd/user/dbus.socket unit sets it in the environment during session startup, with the bus daemon started via socket activation.

jamesh · August 5, 2019, 2:31pm

So now the skeleton of the user session agent has been merged, I guess it is time to start looking at what API it should expose.

Taking my old user services branch (PR #5822) as a starting point, we’d want the following:

stop a list of user units (in order), roughly in line with what wrappers.stopService does (i.e. attempt to kill service units that don’t stop cleanly).
start a list of user units (in order), roughly in line with wrappers.StartServices.
call daemon-reload on the user instance of systemd, as with wrappers.AddSnapServices and RemoveSnapServices.

We shouldn’t need API for enabling/disabling services, since that is done at a global level. There is a need for a daemon-reload after those global actions though.

These could either be their own endpoints, or perhaps expose a single /services endpoint with some parameters:

action being one of start, stop, or daemon-reload.
services as a comma separated list of unit files. Only expected for start and stop.

jamesh · August 16, 2019, 10:55am

I have a basic version of this API here:

This handles the basic start/stop/daemon-reload actions. With an appropriate client library in place, it might be time to revisit user session daemons.