Cannot fsck volume mounted on /var/snap

I have /var/snap mounted on a logical volume, and experienced the case today that, despite unmounting /var/snap, i was unable to fsck the logical volume until i stopped the .mount systemd unit associated with a nsfs mount type.

The nsfs entry in question:

nsfs on /run/snapd/ns/mysnap.mnt type nsfs (rw)

I would get Logical volume ... is in use, lsof and fuser returned nothing.
Snapd was not running, and neither were any daemons within snaps.

  1. How to tell which of the nsfs mounts is holding on to which filesystem / volume?
  2. When i umount the nsfs entry myself, it gets re-created the next time i snap start the service. So why does it not gets cleaned up automatically when I snap stop the service?
    That would line up with my mental model / expectation that when neither snapd nor any snap services are running, nothing is holding on to the volume underlying /var/snap.

Cheers

You can use nsenter utility to “enter” a preserved mount and explore the file system as seen from that mount table. In particular you will find that /var/snap is, most likely, still mounted there.

This is a deeper question. Historically this would allow the application to see continuity in its /tmp directory. As things evolved the way /tmp is handled by snap applications, made it so that you can freely discard the mount namespace and applications would still see the same files in /tmp as they did before.

At a recent meetup we discussed removing the mount namespace persistence layer, mainly because it is highly complex and involves state (persistence means is is stateful) which has gone awry in the past. Removing persistence would cause some delay for each snap startup, as applications would no longer benefit from the caching. Some specialised applications would also change semantics as they grew to depend on the persistent mount table for correctness. Amongst this is, most notably, LXD, which uses another layer of mount table complexity on top of snapd.

I don’t believe we have a way to remove persistence today but I would like to see how we can support running fsck on /var/snap correctly. I will perform a small experiment and return to this thread with additional information.