RHEL/CentOS mount issues

With snapd getting into EPEL as described in Snapd updates in Fedora EPEL for Enterprise Linux I’ve started some work on integrating CentOS with our spread suite.

The latest CentOS version publicly available is 7.5 at the moment, this is what is used in spread. I am using RHEL 7.6 locally for doing some extra verification.

We have run into problems related to mount namespaces and bind mounts that prevent snaps using layout or parallel installed snaps from working properly. It is possible to install the snap, however the snap cannot be removed, unless the mount namespace is discarded beforehand (i.e. one runs sudo /usr/libexec/snapd/snap-discard-ns <snapname>).

The minimal use case that demonstrates the problem:

# 1st terminal, on the host
[rhel@rhel ~]$ mkdir foo bar
[rhel@rhel ~]$ ls -l
total 0
drwxrwxr-x. 2 rhel rhel 6 Nov 16 12:31 bar
drwxrwxr-x. 2 rhel rhel 6 Nov 16 12:31 foo
# 2nd terminal, create mount namespace, slave propagation (like s-c)
[rhel@rhel ~]$ sudo unshare -m --propagation slave
[sudo] password for rhel:
[root@rhel rhel]#
[root@rhel rhel]# ls
bar  foo
[root@rhel rhel]# mount -o bind $PWD/foo $PWD/bar
# 1st terminal
[rhel@rhel ~]$ rmdir foo
[rhel@rhel ~]$ rmdir bar
rmdir: failed to remove ‘bar’: Device or resource busy
[rhel@rhel ~]$ sudo rmdir bar
rmdir: failed to remove ‘bar’: Device or resource busy
# 2nd terminal, back in the unshared mount ns
[root@rhel rhel]# cat /proc/$$/mountinfo |grep bar
209 84 253:0 /home/rhel/foo//deleted /home/rhel/bar rw,relatime - xfs /dev/mapper/rhel-root rw,seclabel,attr2,inode64,noquota

I’ve discussed the problem with @zyga and we are looking at the kernel as the main culprit here. One, within an unshared mount ns can effectively block operations in the host mount ns.

The systems this was observed on:

  • RHEL 7.6 (3.10.0-957.el7.x86_64)
  • CentOS 7.5 (3.10.0-862.11.6.el7.x86_64)

Newer kernels do not have this behavior and the mount goes away automatically. I have verified that the behavior is also correct on RHEL 8 beta (4.18.0-32.el8.x86_64).

RHBZ link: https://bugzilla.redhat.com/show_bug.cgi?id=1650582

1 Like

can we disable layouts and parallel installs in affected systems?

From talking to @zyga I got the impression that more things may be broken as a result. Layouts and mounts added by parallel installs is just the functionality where it’s easiest for the problem to surface. I have observed issues with snap-mgmt not being able to clean up after snaps, even though I added some tweaks to discard snap’s mount namespace before removing snap directories. We will probably know more once I disable relevant tests under spread.

1 Like

RHEL guys provided a workaround that enables the behavior we want:

echo 1 > /proc/sys/fs/may_detach_mounts

Knowing what to look for I found other bug reports related to moby, runc and docker. The seem to drop a file to/usr/lib/sysctl.d with fs.may_detach_mounts=1.

I’ve started a new spread run with this option enabled.

2 Likes

So far so good:

2018-11-17 09:21:33 Successful tasks: 255
2018-11-17 09:21:33 Aborted tasks: 1
2018-11-17 09:21:33 Failed tasks: 1
    - google:centos-7-64:tests/main/cgroup-freezer
2018-11-17 09:21:33 Failed task prepare: 1
    - google:centos-7-64:tests/main/security-device-cgroups-classic