LXD 3.15 fails with unified cgroup hierarchy

When working on cgroup v2 support in snapd, I attempted to run LXD 3.15 from a snap. Got this weird error when the daemon is starting:

==> Escaping the systemd cgroups
write(1, "==> Escaping the systemd cgroups"..., 33) = 33
open("/sys/fs/cgroup", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(3, /* 17 entries */, 32768)    = 624
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
stat("/sys/fs/cgroup/cgroup.controllers/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.max.depth/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.max.descendants/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.procs/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.stat/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.subtree_control/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.threads/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpu.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpuset.cpus.effective/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpuset.mems.effective/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/init.scope/cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/sys/fs/cgroup/init.scope/cgroup.procs", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
write(1, "25418\n", 6)                  = 6
dup2(11, 1)                             = 1
close(11)                               = 0
stat("/sys/fs/cgroup/io.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/memory.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/system.slice/cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/sys/fs/cgroup/system.slice/cgroup.procs", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
write(1, "25418\n", 6)                  = -1 EBUSY (Device or resource busy)
write(2, "sh: ", 4sh: )                     = 4
write(2, "echo: I/O error", 15echo: I/O error)         = 15
write(2, "\n", 1
)                       = 1
dup2(11, 1)                             = 1
close(11)                               = 0
exit_group(1)                           = ?
+++ exited with 1 +++
error: exit status 1

The host is a Fedora 31 (rawhide actually) daily compose.

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# uname -a
Linux localhost 5.3.0-0.rc1.git0.1.fc31.x86_64 #1 SMP Mon Jul 22 07:54:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# systemctl --version
systemd 242 (v242-6.git9d34e79.fc31)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-5.3.0-0.rc1.git0.1.fc31.x86_64 root=UUID=70467fce-1cae-47ca-9afc-211237e8232d ro no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 systemd.unified_cgroup_hierarchy=1

I don’t know which method LXD uses for detecting cgroups v2, but the magic of /sys/fs/cgroup inside the mount ns looks correct:

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# nsenter -m/run/snapd/ns/lxd.mnt
qemu:fedora-31-rawhide-64 /# stat -f -c %t /sys/fs/cgroup/
63677270

cc @stgraber

1 Like

Snap is incompatible with unified cgroups: https://bugzilla.redhat.com/show_bug.cgi?id=1438079

Yeah, that’s what I’m looking into. We have some initial patches for snapd. The problem described in the first post happens on a Fedora 31 with unified cgroups, and LXD from a snap.

Is that using snapd from edge or something else?

Snapd built from my branch: https://github.com/bboozzoo/snapd/commits/bboozzoo/fedora-cgroupv2

Some notes on enabling unified cgroups in Fedora 31 are right here: https://gist.github.com/bboozzoo/76b1535c93686a27bb7fdbaad0f560f7#fedora-31

Ok, got that branch built and working on Ubuntu:

root@vm01:~# hello-world
WARNING: cgroup v2 is not fully supported yet
Hello World!

Will take a look at LXD snap next.

1 Like

@mborzecki LXD latest/edge and latest/candidate now detect and handle cgroupv2 just fine. LXD itself got basic support over a year ago, so once the snap is happy to start, you’re all good.

2 Likes

Thanks for the update!

Can confirm that using lxd from edge makes the spread tests targeting LXD pass again.

Too fast to report success. Noticed I had some parts of the test commented out.

Anyways, the LXD daemon does start correctly now. Running also lxd init --auto works.

However spawning coantainers does not. The test tries to launch a container using ubuntu:18.04 image.

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# lxc info --show-log my-ubuntu
WARNING: cgroup v2 is not fully supported yet
Name: my-ubuntu
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/07/29 11:17 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc my-ubuntu 20190729112554.937 ERROR    utils - utils.c:safe_mount:1212 - Operation not permitted - Failed to mount "
sysfs" onto "/var/snap/lxd/common/lxc//sys"
lxc my-ubuntu 20190729112554.937 ERROR    conf - conf.c:lxc_mount_auto_mounts:745 - Operation not permitted - Failed to
 mount "sysfs" on "/var/snap/lxd/common/lxc//sys" with flags 0
lxc my-ubuntu 20190729112554.937 ERROR    conf - conf.c:lxc_setup:3594 - Failed to setup first automatic mounts
lxc my-ubuntu 20190729112554.937 ERROR    start - start.c:do_start:1321 - Failed to setup container "my-ubuntu"
lxc my-ubuntu 20190729112554.938 ERROR    sync - sync.c:__sync_wait:62 - An error occurred in another process (expected
 sequence number 5)
lxc my-ubuntu 20190729112554.938 WARN     network - network.c:lxc_delete_network_priv:3377 - Failed to rename interface
 with index 16 from "eth0" to its initial name "vethd9519a32"
lxc my-ubuntu 20190729112554.938 ERROR    start - start.c:lxc_abort:1122 - Function not implemented - Failed to send SI
GKILL to 26331
lxc my-ubuntu 20190729112554.938 ERROR    start - start.c:__lxc_start:2039 - Failed to spawn container "my-ubuntu"
lxc my-ubuntu 20190729112554.938 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:873 - Received contain
er state "ABORTING" instead of "RUNNING"
qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# snap list lxd
Name  Version      Rev    Tracking  Publisher   Notes
lxd   git-37406a5  11428  edge      canonical✓  -

Nothing relevant in the audit log. For completeness I’ve temporarily disabled dontaudit in SElinux, but nothing came up either.

What kernel is that on? It matches the symptoms we’ve seen on the 4.3 kernel which has a regression around mounting those filesystems.

If you’re running on 5.3, switch to something else as that kernel has a broken mount API.

Yes, it’s a 5.3 kernel, 5.3.0-0.rc1.git0.1.fc31.x86_64. Do you know whether this is already patched in master?

No, at this time, fixes are being discussed but none were merged in mainline Linux.
If you want a working mount API, you need to downgrade to 5.2 or earlier.

1 Like

5.2 is also broken, wrt concurrent loop mount operations.

@stgraber provided a link to LKML tread about the problem with 5.3 https://lkml.org/lkml/2019/7/26/388