Using snap-update-ns from snap-confine to initialize mount namespaces

zyga-snapd · July 12, 2017, 3:05pm

Thank you for the comment Jamie!

Let me start by saying that my intent is to implement the layout feature that will be central to base snaps and app snaps alike. The layout feature. You can get a glimpse of it from this post Development sprint June 26th, 2017

Part of this used to be called the “overmount” interface, if you recall.

With the agreements from the sprint any snap will now be able to specify a layout section (separately from interfaces and applications). Layouts will use a combination of existing features as well as new, often natural extensions.

Let me start by coping the example from the whiteboard at the sprint.

/usr:
    bind: $SNAP/usr
/mytmp:
    type:  tmpfs
    user:  nobody
    group: nobody
    mode   1777
/mylink:
    symlink: /link/target

This section would appear alongside apps/interfaces, at the top level of snapcraft.yaml and snap.yaml files.

The idea is that snapd would then use layouts (also from the desired base snap but I’ll skip that for now) to create the /var/lib/snapd/mount/SNAP_NAME.fstab file. Interestingly changes would behave much like existing changes to mount namespaces, so layouts could evolve from revision to revision.

In the example above a following fstab profile might be synthesized.

$SNAP/usr /usr none bind,ro 0 0 
none /mytmp tmpfs x-snap-user:nobody,x-snap-group:nobody,x-snap-mode:1777 0 0
none /mylink x-snap-symlink x-snap-symlink:/link/target 0 0

I used $SNAP variables but those might be expanded by snapd. As you can see I invented some helpers such as x-snap-xxx options designed to convey additional parameters as well as the x-snap-symlink filesystem type.

You will be quick to notice that /mytmp cannot be immediately created on any existing base snap simply because such directory does not exist in the underlying squashfs. For any missing parents we will use a mechanism such as the “writable mimic” we currently have in snap-confine to put a tmpfs and a farm of directories with bind mounts in place. This behavior will be implicit so that users just express their preference and snapd will make the necessary changes to perform the operation.

The symlink is a natural extension of “I want my mount namespace to look like this” and as it is all an opt-in feature we chose to offer it as an available element.

All of this will naturally need careful analysis of what to allow, what to not allow, how to do undos, etc (especially in light of the writable mimic approach).

Ideally we’d use something easier to work with (overlayfs) but I believe we cannot rely on one, that is sufficiently capable and works with confinement, across the deployed kernels.

One more point about apparmor confinement for snap-update-ns. Given that a layout may desire to put almost arbitrary bind mount or tmpfs in any non-blacklisted directory I think it is unrealistic to extend the apparmor profile of snap-confine to be this liberal (it would be too powerful IMO). As such I think we could split this so that snap-confine would essentially have a fixed set of operations and would then call into snap-update-ns, with a profile transition and a fixed, dedicated profile for that specific snap. Such profile would be generated by snapd and would contain the super-set of “do” and “undo” operations for all the entries in the mount profile.

Having said that I worry that we may need to go to the opposite extreme and unconfine snap-confine given the apparmor mount namespace transition bug we encountered at the sprint. I will report that bug today, along with a branch that reproduces it, but earlier discussions with jj indicate that it is an unfixed, known issue that may take some time to address (at least one upstream kernel release cycle).