Can interfaces define per-user bind mounts?

Okay, so I think I know what is going on now: bubblewrap (flatpak’s equivalent to snap-confine) switches back to the user’s uid while preserving some root capabilities:

https://github.com/projectatomic/bubblewrap/blob/master/bubblewrap.c#L580-L600

So this allows them to call mount() as a normal user, which in turn allows that mount call to reference paths within the FUSE file system. For added security, it implements privilege separation by forking and dropping the capabilities in one of the processes.

This is looking a lot more complicated than a simple afternoon hack :frowning:

I’ve not thought it all through, but allow_root may be acceptable here. For snaps, we could have ‘owner’ match for these paths if we wanted (we already have owner /run/user/[0-9]*/snap.@{SNAP_NAME}/...). For non-snaps, not having ‘allow_root’ as a security mechanism on the surface seems specious since if you are root you can replace libfuse and make it not care about allow_root. From the link you gave:

libfuse-specific mount options:
       These following options are not actually passed to the kernel but
       interpreted by libfuse. They can be specified for all filesystems
       that use libfuse:

       allow_root
              This option is similar to allow_other but file access is
              limited to the filesystem owner and root.  This option and
              allow_other are mutually exclusive.

(note, ‘allow_root’ is not passed to the kernel and is only enforced by libfuse, therefore it can easily be subverted).

Just to be clear, the FUSE file system in question here is outside of the sandbox (it is part of a trusted helper, to use the Ubuntu Touch terminology). The problem is that I want to bind mount from a subdirectory inside that file system, which fails because root is prevented access.

Now we’ll need to modify the xdg-document-portal source code to allow it to detect when it is talking to a snap over D-Bus, so we could certainly turn on allow_root. But there are legitimate reasons why this isn’t on by default, so I think it is reasonable to ask whether we should handle this case if/when snap-confine gains the ability to set up per-user mounts.

Yes.

That is what I was speaking to. I was questioning whether there were legitimate reasons why it wasn’t on by default. Like I said, I didn’t think it all through, so I’m only questioning it. :slight_smile: Do you have details on what this is trying to protect?

There’s this comment in the kernel side code:

So I doubt the xdg-document-portal developers consciously decided to block root access: they just used the defaults, which allowed all the behaviours they needed.

@jdstrand: do you have any more thoughts about this? I have put together a demo PPA that uses the allow_root FUSE flag here:

… but it requires that I edit the /etc/fuse.conf file to enable this feature. If we want to provide good integration for confined desktop apps, then we’ll need a real fix for this.

I started digging into this again last week, and put together the following PR (which @zyga-snapd has kindly given some feedback on already):

This doesn’t yet perform the mounts as a regular user, so doesn’t yet handle xdg-document-portal’s FUSE file system. But I thought it would be good to get some early feedback on the general design.

There is also the question of whether it makes sense to move the responsibility for creating user mounts from snap-confine to snap-update-ns. On the plus side, it centralises all the mount code. On the minus side, it’d be an extra fork+exec each time you run an app that has user mounts. Maybe that isn’t too bad though.

2 Likes

Since this would be only the cost paid when we run a snap for the first time (as a given user) I think it is well worth the price. I have some upcoming patches that will greatly complicate the mount namespace setup (the profiles will grow lots of features) so I think this is the right way to proceed.

As I mentioned on the PR, I’m not currently preserving this second mount namespace so the cost is incurred every time. I could change this, but I’d need some way to guarantee that the saved mount namespace wouldn’t outlive the user session.

Keeping the mount namespaces ephemeral gives me this property, but there might be other designs that also achieve it.

I understand this better now. If we agree to make user namespaces ephemeral then I believe this is a correct simplification. I there is agreement to go that way I will support it.

It might be interesting to instrument this. I suspect that the fork/exec overhead is going to be dwarfed by the snap run/snap-confine/snap-exec overhead that currently exists. Of course, we don’t want to be slow for no reason and it is of course good to think about this critically. In this particular case, I suspect the cost would be worth it with the understanding we could optimize later if needed.

1 Like

Yep. I agree that this is something worth checking. My initial reaction was based on a gut feel that may not actually measure up to reality.

@zyga-snapd I don’t think we want to make user namespaces ephemeral. As quickly discussed yesterday with @jamesh, this is a general problem which we’ve already talked about in the past and even had a candidate solution for. We want to renew the namespace once there are no more processes from the snap on that specific namespace, user-based or otherwise.

I have working code for that. I’ll propose it tomorrow so that @jdstrand and @tyhicks can examine it and the apparmor bug it was blocked on back then.

1 Like

TO @jamesh CC @jdstrand

I had a look at https://github.com/snapcore/snapd/pull/3963 and discussed this with @niemeyer

There’s an argument that what we did in the paste (delivering persistent mount namespaces before the update tool) was a mistake and we should attempt not to repeat that mistake here. We think there is a way forward that avoids that:

Let’s start by examining the issue in case we start to persist the per-user mount namespace. The only problem is that the FUSE mount uses a userspace daemon and that daemon is killed when the session is ending and when the user is logging out. We can treat this as a specific case of the “mount namespace is stale” problem.

We already have (partial) logic for handling that (we handle the detection, we don’t act on this yet). We can extend that logic to include a condition “there are no processes belonging to the given user that are alive anymore”. The base namespace will be updated when the base snap changes and it is safe to do so (no processes from that snap are alive). The per-user mount namespace is stale when it is no longer inhabited by any processes belonging to that user. We can use the same process enumeration technique we already have for doing that.

This will work but will result in a small inefficiency. In a case where we run a single process application, close it and run it again we will re-construct the per-user namespace even though it was not really necessary. As an optimization we can also store the identifier of the session and keep the using the persisted per-user mount namespace, even if it has no processes anymore, for as long as the session identifier is unchanged.

Let’s do this:

  • capture the per-user mount namespace in a new file /run/snapd/snap.$SNAP_NAME.user.$UID.mnt, this file will live alongside the older /run/snapd/ns/$SNAP_NAME.mnt (over time that other will migrate to snap.$SNAP_NAME.system.mnt)
  • before running the update tool consider reusing or discarding the preserved per-user mount namespace. The conditions are described above (no processes belonging to that user and snap are alive anymore).
  • run snap-update-ns for each user whenever interfaces change requires this (user mount profiles change). We can enumerate the existing per-user mount namespaces by looking at /run/snapd/ns/snap.*.user.*.mnt and running the tool for each one.

This has the benefit of actually handling updates (we can disconnect and the fuse mount will go away) and opening the path for more general user mounts (we could start using $SNAP_USER_DATA)

What do you think?

Might it make sense to store the user mount namespace files under $XDG_RUNTIME_DIR? Looking at the base dir spec, this seems like it would help with the lifecycle issues:

The lifetime of the directory MUST be bound to the user being logged in. It MUST be created when the user first logs in and if the user fully logs out the directory MUST be removed. If the user logs in more than once he should get pointed to the same directory, and it is mandatory that the directory continues to exist from his first login to his last logout on the system, and not removed in between. Files in the directory MUST not survive reboot or a full logout/login cycle.

You’d need to be careful though, given that this directory is controlled by the user.

I don’t mind that much though I worry that if anything tries to remove that directory it will choke on the .mnt files as those are active mount points.

Okay. I guess that suggestion doesn’t really help then. Having a live mount under $XDG_RUNTIME_DIR that outlives the user’s session(s) is likely to cause problems with the code that tries to delete the runtime dir.

One other possibility for cleaning up stale mounts would be to rely on systemd’s logind, which has a UserRemoved signal that fires when the user logs out of their last session. Presumably this would let us clean up stale mount namespaces while still preserving them over sequential invocations of an app within the same session.

I believe this would even work on Upstart based Trusty, since we were shipping systemd-shim to allow logind to function without systemd.

I don’t think we need any support from the session. We can clean them up at the time new processes are launched.

@zyga-snapd: so what is the way forward for the pull request now? It sounds like there are essentially two feature requests:

  1. preserve and reuse the mount namespace so runtime changes to the .user-fstab file can be propagated to running processes.
  2. on a more general level, automatically discard stale mount namespaces.

As I mentioned to you previously on IRC, my concern is that implementing (1) without (2) effectively means the feature is no longer fit for purpose. So is there any harm in accepting the feature in its current state and implement (1) once (2) is functional?