LXD 3.15 fails with unified cgroup hierarchy

mborzecki · July 25, 2019, 10:59am

When working on cgroup v2 support in snapd, I attempted to run LXD 3.15 from a snap. Got this weird error when the daemon is starting:

==> Escaping the systemd cgroups
write(1, "==> Escaping the systemd cgroups"..., 33) = 33
open("/sys/fs/cgroup", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
fstat(3, {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
getdents(3, /* 17 entries */, 32768)    = 624
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
stat("/sys/fs/cgroup/cgroup.controllers/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.max.depth/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.max.descendants/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.procs/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.stat/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.subtree_control/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cgroup.threads/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpu.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpuset.cpus.effective/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/cpuset.mems.effective/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/init.scope/cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/sys/fs/cgroup/init.scope/cgroup.procs", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
write(1, "25418\n", 6)                  = 6
dup2(11, 1)                             = 1
close(11)                               = 0
stat("/sys/fs/cgroup/io.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/memory.pressure/cgroup.procs", 0x7fff646af530) = -1 ENOTDIR (Not a directory)
stat("/sys/fs/cgroup/system.slice/cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
open("/sys/fs/cgroup/system.slice/cgroup.procs", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
fcntl(1, F_DUPFD, 10)                   = 11
close(1)                                = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
dup2(3, 1)                              = 1
close(3)                                = 0
write(1, "25418\n", 6)                  = -1 EBUSY (Device or resource busy)
write(2, "sh: ", 4sh: )                     = 4
write(2, "echo: I/O error", 15echo: I/O error)         = 15
write(2, "\n", 1
)                       = 1
dup2(11, 1)                             = 1
close(11)                               = 0
exit_group(1)                           = ?
+++ exited with 1 +++
error: exit status 1

The host is a Fedora 31 (rawhide actually) daily compose.

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# uname -a
Linux localhost 5.3.0-0.rc1.git0.1.fc31.x86_64 #1 SMP Mon Jul 22 07:54:10 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# systemctl --version
systemd 242 (v242-6.git9d34e79.fc31)
+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/boot/vmlinuz-5.3.0-0.rc1.git0.1.fc31.x86_64 root=UUID=70467fce-1cae-47ca-9afc-211237e8232d ro no_timer_check net.ifnames=0 console=tty1 console=ttyS0,115200n8 systemd.unified_cgroup_hierarchy=1

I don’t know which method LXD uses for detecting cgroups v2, but the magic of /sys/fs/cgroup inside the mount ns looks correct:

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# nsenter -m/run/snapd/ns/lxd.mnt
qemu:fedora-31-rawhide-64 /# stat -f -c %t /sys/fs/cgroup/
63677270

cc @stgraber

Reynolds5 · July 25, 2019, 1:09pm

Snap is incompatible with unified cgroups: https://bugzilla.redhat.com/show_bug.cgi?id=1438079

mborzecki · July 25, 2019, 1:39pm

Yeah, that’s what I’m looking into. We have some initial patches for snapd. The problem described in the first post happens on a Fedora 31 with unified cgroups, and LXD from a snap.

stgraber · July 25, 2019, 3:11pm

Is that using snapd from edge or something else?

mborzecki · July 25, 2019, 7:16pm

Snapd built from my branch: https://github.com/bboozzoo/snapd/commits/bboozzoo/fedora-cgroupv2

Some notes on enabling unified cgroups in Fedora 31 are right here: https://gist.github.com/bboozzoo/76b1535c93686a27bb7fdbaad0f560f7#fedora-31

stgraber · July 27, 2019, 7:09pm

Ok, got that branch built and working on Ubuntu:

root@vm01:~# hello-world
WARNING: cgroup v2 is not fully supported yet
Hello World!

Will take a look at LXD snap next.

stgraber · July 27, 2019, 11:41pm

@mborzecki LXD latest/edge and latest/candidate now detect and handle cgroupv2 just fine. LXD itself got basic support over a year ago, so once the snap is happy to start, you’re all good.

mborzecki · July 29, 2019, 5:21am

Thanks for the update!

mborzecki · July 29, 2019, 10:53am

Can confirm that using lxd from edge makes the spread tests targeting LXD pass again.

mborzecki · July 29, 2019, 11:38am

Too fast to report success. Noticed I had some parts of the test commented out.

Anyways, the LXD daemon does start correctly now. Running also lxd init --auto works.

However spawning coantainers does not. The test tries to launch a container using ubuntu:18.04 image.

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# lxc info --show-log my-ubuntu
WARNING: cgroup v2 is not fully supported yet
Name: my-ubuntu
Location: none
Remote: unix://
Architecture: x86_64
Created: 2019/07/29 11:17 UTC
Status: Stopped
Type: persistent
Profiles: default

Log:

lxc my-ubuntu 20190729112554.937 ERROR    utils - utils.c:safe_mount:1212 - Operation not permitted - Failed to mount "
sysfs" onto "/var/snap/lxd/common/lxc//sys"
lxc my-ubuntu 20190729112554.937 ERROR    conf - conf.c:lxc_mount_auto_mounts:745 - Operation not permitted - Failed to
 mount "sysfs" on "/var/snap/lxd/common/lxc//sys" with flags 0
lxc my-ubuntu 20190729112554.937 ERROR    conf - conf.c:lxc_setup:3594 - Failed to setup first automatic mounts
lxc my-ubuntu 20190729112554.937 ERROR    start - start.c:do_start:1321 - Failed to setup container "my-ubuntu"
lxc my-ubuntu 20190729112554.938 ERROR    sync - sync.c:__sync_wait:62 - An error occurred in another process (expected
 sequence number 5)
lxc my-ubuntu 20190729112554.938 WARN     network - network.c:lxc_delete_network_priv:3377 - Failed to rename interface
 with index 16 from "eth0" to its initial name "vethd9519a32"
lxc my-ubuntu 20190729112554.938 ERROR    start - start.c:lxc_abort:1122 - Function not implemented - Failed to send SI
GKILL to 26331
lxc my-ubuntu 20190729112554.938 ERROR    start - start.c:__lxc_start:2039 - Failed to spawn container "my-ubuntu"
lxc my-ubuntu 20190729112554.938 ERROR    lxccontainer - lxccontainer.c:wait_on_daemonized_start:873 - Received contain
er state "ABORTING" instead of "RUNNING"

qemu:fedora-31-rawhide-64 .../tests/main/selinux-lxd# snap list lxd
Name  Version      Rev    Tracking  Publisher   Notes
lxd   git-37406a5  11428  edge      canonical✓  -

Nothing relevant in the audit log. For completeness I’ve temporarily disabled dontaudit in SElinux, but nothing came up either.

stgraber · July 29, 2019, 12:11pm

What kernel is that on? It matches the symptoms we’ve seen on the 4.3 kernel which has a regression around mounting those filesystems.

If you’re running on 5.3, switch to something else as that kernel has a broken mount API.

mborzecki · July 29, 2019, 12:26pm

Yes, it’s a 5.3 kernel, 5.3.0-0.rc1.git0.1.fc31.x86_64. Do you know whether this is already patched in master?

stgraber · July 29, 2019, 12:40pm

No, at this time, fixes are being discussed but none were merged in mainline Linux.
If you want a working mount API, you need to downgrade to 5.2 or earlier.

chipaca · July 29, 2019, 12:42pm

5.2 is also broken, wrt concurrent loop mount operations.

mborzecki · July 29, 2019, 1:52pm

@stgraber provided a link to LKML tread about the problem with 5.3 https://lkml.org/lkml/2019/7/26/388