Using snap-update-ns from snap-confine to initialize mount namespaces

Hey

I’d like to start working on a new feature where snap-confine would stop carrying some of the mount namespace initialization code. Initially I’d like to delegate processing of the .fstab file (both the desired and active ones) to snap-update-ns. The reason for this is twofold:

  • currently we need identical implementation in go and C (depending on 1st initialization or subsequent update) which adds cost and complexity as it must do the same thing in both cases
  • we need to extend that code to handle additional features soon (to implement layouts) so the cost would double

In addition we may now generate a profile for snap-update-ns that is unique to each snap and that would allow us to securely manipulate the mount namespace without opening confinement of snap-confine to perform arbitrary operations.

My plan is as follows:

  • replace the code that processes .fstab files in snap-confine with a call to snap-update-ns, this will inherit the current snap-confine apparmor profile.
  • patch snapd to generate an apparmor profile for snap-update-ns running in the context of a given snap (TBD: profile name)
  • patch snap-confine’s apparmor profile to perform profile transition when executing snap-update-ns after pivot_root

This will allow us to implement new “fstab” features such as tmpfs mounts, symlinks and creation of parent mount points etc.

Comments?

AIUI, snap-confine will use an ‘ix’ rule to call snap-update-ns which will take the .fstab files and setup the mount namespace in /run/snapd/ns/snap.name.command then control goes back to snap-confine which will enter the namespace like before (it ‘just’ won’t be the one that set it up).

In principle, this should be fine because in terms actions, everything is still serial and we shouldn’t suffer problems resulting from golang not supporting privilege dropping, dropping capabilities, fork(), etc (this assumes that goroutines don’t get in the way of setting up the mount namespace of course!).

I do have concerns though that we are changing the assumptions of snap-update-ns; specifically that it was written with the understanding that it would be launched under snapd which is not an attack vector for privilege escalation. Now it would not only be launched from snapd but also from a setuid executable and now therefore be an attack vector for those trying to escalate privileges via implementation bugs in snap-confine/snap-update-ns. Put another way, while this makes snap-confine smaller, the attack surface is not actually reduced (it is even bigger considering snap-update-ns’ imports (we tried very hard to limit the number of libraries snap-confine uses (best practice for writing setuid applications)). Because snap-update-ns will be running in the context of a setuid application, the lines within the codebase between what is running setuid becomes muddy (eg, today it is pretty easy to know when to ask the security team for reviews: “is the code in cmd/snap-confine changed?”. This becomes difficult and easy to miss when to ask for security reviews-- eg, snap-update-ns imports “github.com/snapcore/snapd/logger” which all of a sudden will now at times run under setuid-- who will remember to ask the security team to do a review of changes to logger?). Furthermore, snap-update-ns and everything it imports will now need a proper security review. With the rapid pace of development of the go code within snappy, it is easy to imagine introducing security flaws…

I like how you want to confine snap-update-ns by limiting it to ‘ix’. We could really tighten this by using a child profile (‘Cx’ rule) or hat that snap-confine could change_hat() into. Depending on how restrictive this profile is, it would mitigate security concerns regarding the muddied setuidness in the codebase (for systems with AppArmor of course). At a minimum, I imagine we’d want to utilize secure exec (which a ‘Cx’ rule gives us for free), extremely carefully consider its inputs for how users might attack them (command line, etc), consider providing an empty environment to snap-update-ns and eliminating/drastically reducing the number of imports in use by snap-update-ns.

I’m not clear on this. Perhaps it would help to draw a diagram (ascii art is fine) for when snap-confine calls snap-update-ns all the way to snap command exec() that includes all the AppArmor profile changes. Please also draw a similar diagram for when snapd calls snap-update-ns and anything else related to updating the mount namespace.

I suspect this is fine but want to better understand the previous point.

There is a lot in this compact statement. What else is in the features covered by ‘etc’? Can you give a use case for all of these features (tmpfs, symlinks, parent mount points and those covered by ‘etc’)?

Thank you for the comment Jamie!

Let me start by saying that my intent is to implement the layout feature that will be central to base snaps and app snaps alike. The layout feature. You can get a glimpse of it from this post Development sprint June 26th, 2017

Part of this used to be called the “overmount” interface, if you recall.

With the agreements from the sprint any snap will now be able to specify a layout section (separately from interfaces and applications). Layouts will use a combination of existing features as well as new, often natural extensions.

Let me start by coping the example from the whiteboard at the sprint.

/usr:
    bind: $SNAP/usr
/mytmp:
    type:  tmpfs
    user:  nobody
    group: nobody
    mode   1777
/mylink:
    symlink: /link/target

This section would appear alongside apps/interfaces, at the top level of snapcraft.yaml and snap.yaml files.

The idea is that snapd would then use layouts (also from the desired base snap but I’ll skip that for now) to create the /var/lib/snapd/mount/SNAP_NAME.fstab file. Interestingly changes would behave much like existing changes to mount namespaces, so layouts could evolve from revision to revision.

In the example above a following fstab profile might be synthesized.

$SNAP/usr /usr none bind,ro 0 0 
none /mytmp tmpfs x-snap-user:nobody,x-snap-group:nobody,x-snap-mode:1777 0 0
none /mylink x-snap-symlink x-snap-symlink:/link/target 0 0

I used $SNAP variables but those might be expanded by snapd. As you can see I invented some helpers such as x-snap-xxx options designed to convey additional parameters as well as the x-snap-symlink filesystem type.

You will be quick to notice that /mytmp cannot be immediately created on any existing base snap simply because such directory does not exist in the underlying squashfs. For any missing parents we will use a mechanism such as the “writable mimic” we currently have in snap-confine to put a tmpfs and a farm of directories with bind mounts in place. This behavior will be implicit so that users just express their preference and snapd will make the necessary changes to perform the operation.

The symlink is a natural extension of “I want my mount namespace to look like this” and as it is all an opt-in feature we chose to offer it as an available element.

All of this will naturally need careful analysis of what to allow, what to not allow, how to do undos, etc (especially in light of the writable mimic approach).

Ideally we’d use something easier to work with (overlayfs) but I believe we cannot rely on one, that is sufficiently capable and works with confinement, across the deployed kernels.

One more point about apparmor confinement for snap-update-ns. Given that a layout may desire to put almost arbitrary bind mount or tmpfs in any non-blacklisted directory I think it is unrealistic to extend the apparmor profile of snap-confine to be this liberal (it would be too powerful IMO). As such I think we could split this so that snap-confine would essentially have a fixed set of operations and would then call into snap-update-ns, with a profile transition and a fixed, dedicated profile for that specific snap. Such profile would be generated by snapd and would contain the super-set of “do” and “undo” operations for all the entries in the mount profile.

Having said that I worry that we may need to go to the opposite extreme and unconfine snap-confine given the apparmor mount namespace transition bug we encountered at the sprint. I will report that bug today, along with a branch that reproduces it, but earlier discussions with jj indicate that it is an unfixed, known issue that may take some time to address (at least one upstream kernel release cycle).

1 Like

@zyga-snapd, thanks for explaining this, it is a lot clearer now. I like the idea of the per-snap snap-update-ns profile for snaps that opt into the layout feature.

As for overlayfs, as it happens I reviewed the status of apparmor with overlayfs recently (prompted by the greengrass-support work) and while encouraged it works as well as it does with Ubuntu 4.4, 4.10 and the upcoming 4.11 kernels, there are a lot of bugs that affect mediation. Of course, even if mediation worked perfectly, overlayfs doesn’t exist in all deployed kernels (like you said). I also didn’t look at SELinux support with overlayfs, which could impact decisions since I know confinement on SELinux systems is a future goal.

It seems clear that this feature is going to be a lot of work to ensure we have a secure implementation.

@zyga-snapd Isn’t this what you’re already working on for getting layouts in place?

Yes, this is exactly done so that layouts can be implemented by extending snap-update-ns without further complicating snap-confine.

So this has landed now! (woot).

I will be iterating on restoring the code that I lost last Thursday evening that builds upon this to create mount directories, respecting permissions and ownership specified by the mount profiles.