Fixing live propagation of mount changes

Hey, I wanted to update you on where we are. In PR3216 you can read the general high-level algorithm. The general idea is as follows:

  • Read the current and desired mount profiles. Those are fstab-like files that are written by snapd (desired profile) and snap-confine and snap-update-ns (current profile)
  • Diff the profiles so that we know what we should mount or unmount
  • For each such mount change, see if we need to perform it in practice (aka Change.Needed function). This is because we may be upgrading from an older snap-confine that doesn’t write the current profile or because the snap may have mounted something by itself.
  • Collect all the applied changes (we don’t stop if an error occurs) and write the current profile for next time we run snap-update-ns.

I got stuck on Change.Needed as it is unfortunately non-trivial to compute the answer. The problem is what we are working with as in data representation given by the kernel. This is described in a mountinfo file which is documented here.

Mountinfo contains a rather raw representation of the kernel mount table. Each mount has an MountID and a ParentID, a MountDir that described where something is mounted, a MountSource and FsType that describe what is mounted and, curiously, Root which describes the subtree of the MountSource that is mounted, this is essential as we will see shortly.

This information is relatively simple to process for regular mounts (so not bind mounts and not things like tmpfs that are not associated with a block device). A super simple version of Change.Needed that works with this data is present in PR3209. Unfortunately snapd relies on bind mounts heavily so we need something better or more general.

The first problem with bind mounts is that the fstab-like file that describes our intent doesn’t contain absolute information. The source mount is “resolved” at the time when the mount is performed. To illustrate this contrast two fstab-like entries, one which describes a regular mount and another that describes a bind mount:

/dev/sda2 / ext4 errors=remount-ro 0 1
/snap/ubuntu-app-platform/34 /snap/lonewolf/3/ubuntu-app-platform none bind,ro 0 0

In the first case we know that we’re mounting /dev/sda2. In the second case we have no idea what we are mounting, we can only analyse the mount table and deduce what is mounted on /snap/ubuntu-app-platform/32. What is worse is that we can bind mount something that is not itself a mount point. To illustrate this imagine that the last line read like this instead:

/snap/ubuntu-app-platform/34/stuff /snap/lonewolf/3/ubuntu-app-platform none bind,ro 0 0

Now stuff is being bind-mounted to /snap/lonewolf/3/ubuntu-app-platform but stuff is a directory under (presumably) some revision of the ubuntu-app-platform squashfs.

Now let’s examine a similar (but different because I already have the test data) case. Let’s start with the shell commands that I performed:

$ mkdir data
$ sudo mount -t tmpfs none data
$ cd data                                                                                                                     
$ mkdir foo bar
$ sudo mount --bind foo bar                                                                                                                               
$ mkdir froz
$ mount --bind bar froz                                                                                                                                 

The mountinfo table says this (I left out irrelevant parts):

25 0 8:2 / / rw,relatime shared:1 - ext4 /dev/sda2 rw,errors=remount-ro,data=ordered
392 25 0:47 / /home/zyga/data rw,relatime shared:158 - tmpfs none rw
400 392 0:47 /foo /home/zyga/data/bar rw,relatime shared:158 - tmpfs none rw
408 392 0:47 /foo /home/zyga/data/froz rw,relatime shared:158 - tmpfs none rw

The question we would now like to answer is: Has the mount --bind /home/zyga/data/foo /home/zyga/data/froz already occured?

My initial take on this was that we want to understand what /home/zyga/data/foo really is and then look for anything else that is the same thing. Here we can analyse the mountinfo table and come up with an answer that /home/zyga/data/foo is a fragment /foo of the tmpfs mounted ID 392. Earlier I used MountSource (.e.g /dev/sda2) but it is useless for tmpfs/proc and other virtual filesystems so I abandoned that and switched to MountID.

So we can rephrase the question as such: is “/foo”@MountID:492 present at /home/zyga/data/froz ?

Here the complexity and my uncertainty lies in the fact that the relevant mountinfo entry feels somewhat ambiguous:

408 392 0:47 /foo /home/zyga/data/froz rw,relatime shared:158 - tmpfs none rw

So we know that /home/zyga/data/froz is /foo from some tmpfs but we have no way to say if that is really the one with MountID 392. It seems like the ParentID is the key here but I don’t know if a simple match against it is sufficient. I tried many different examples yesterday and each time I got to a “oh, let’s use this” I could create another example where that wasn’t sufficient.

I wanted to share the problem statement so that code review can be done more meaningfully and everyone can share their ideas.