Lxd snap fails at boot time on Debian 11 (cgroup2)

lxd fails to start at boot time on Debian 11 (amd64 with Debian 5.10.70-1 default kernel) with apparmor errors.

Debian 11 uses cgroup2 by default.

Oct 14 08:52:52 oak audit[2238]: AVC apparmor="DENIED" operation="open" profile="snap-update-ns.lxd" name="/sys/fs/cgroup/system.slice/snap.lxd.daemon.unix.socket/cgroup.freeze" pid=2238 comm="5" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
Oct 14 08:52:52 oak kernel: kauditd_printk_skb: 20 callbacks suppressed
Oct 14 08:52:52 oak kernel: audit: type=1400 audit(1634197972.897:32): apparmor="DENIED" operation="open" profile="snap-update-ns.lxd" name="/sys/fs/cgroup/system.slice/snap.lxd.daemon.unix.socket/cgroup.freeze" pid=2238 comm="5" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
Oct 14 08:52:52 oak kernel: audit: type=1400 audit(1634197972.901:33): apparmor="DENIED" operation="open" profile="snap-update-ns.lxd" name="/sys/fs/cgroup/system.slice/snap.lxd.daemon.unix.socket/cgroup.freeze" pid=2238 comm="5" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
Oct 14 08:52:52 oak audit[2238]: AVC apparmor="DENIED" operation="open" profile="snap-update-ns.lxd" name="/sys/fs/cgroup/system.slice/snap.lxd.daemon.unix.socket/cgroup.freeze" pid=2238 comm="5" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
Oct 14 08:52:52 oak lxd.activate[2238]: cannot update snap namespace: cannot finish freezing processes of snap "lxd": cannot freeze processes of snap "lxd", open /sys/fs/cgroup/system.slice/snap.lxd.daemon.unix.socket/cgroup.freeze: permission denied
Oct 14 08:52:52 oak lxd.activate[2217]: snap-update-ns failed with code 1
Oct 14 08:52:52 oak systemd[1]: snap.lxd.activate.service: Main process exited, code=exited, status=1/FAILURE
Oct 14 08:52:52 oak systemd[1]: snap.lxd.activate.service: Failed with result 'exit-code'.
Oct 14 08:52:52 oak systemd[1]: Failed to start Service for snap application lxd.activate.

This was observed with lxd 4.19 on snapd 2.52, and also with lxd 4.0.7 on snapd 2.52.

Thanks!

Tim.

I have no idea why the cgroup is snap.lxd.daemon.unix.socket and not snap.lxd.daemon.service.

Can you run systemctl cat snap.lxd.daemon.unix.socket and attach the output?

# /etc/systemd/system/snap.lxd.daemon.unix.socket
[Unit]
# Auto-generated, DO NOT EDIT
Description=Socket unix for snap application lxd.daemon
Requires=snap-lxd-21545.mount
After=snap-lxd-21545.mount
X-Snappy=yes

[Socket]
Service=snap.lxd.daemon.service
FileDescriptorName=unix
ListenStream=/var/snap/lxd/common/lxd/unix.socket
SocketMode=0660

[Install]
WantedBy=sockets.target

Cheers,

Tim.

Please let me know if you need any more info here and/or if it would be better to post open a bug on launchpad instead / as well?

Cheers,

Tim.

I tried a simple reproducer with a foo.socket which triggers foo.service, and was able to confirm that the cgroup used by foo.service is correct. Which does not really provide any more new insight into why the cgroup was incorrect in your case.

I’ve also installed the snapd package and subsequently the lxd snap, launched an instance, rebooted, no problems so far.

The log you provided earlier clearly indicates that the snap.lxd.activate.service runs under an incorrect cgroup. Those are set up by systemd, and on v2 neither snap-confine nor /usr/bin/snap change anything in the hierarchy.

Thanks for taking a look at this… So does it look like the problem may be within systemd? Assuming I can get this to happen in a test environment (I should be able to, but the problematic machine is in production), then what do you think the next steps would be to start to debug this? I’m thinking:

  • Check still present with newer snapd (git and latest release).
  • Check still present with newer systemd.
  • Dive into debugging systemd if still present.

Does that sound reasonable?

Cheers,

Tim.

I’ve identifed this as a problem in snapd after all. The fix has been proposed in https://github.com/snapcore/snapd/pull/11006. Since Debian supports reexec of the snapd snap, once the fix lands you can refresh snapd from edge (snap refresh --edge snapd) and the problem should be gone.