Snap freezes after update to Ubuntu 22.04

Excellent, thanks! I think I finally got it, it’s a bug in snapd, which was expecting the lines in the /proc/<pid>/cgroup file to appear in a certain order. I’ll prepare a fix early next week.

2 Likes

Hi! So, here’s a snapd with a possible fix, could you please try it?

My understanding of the issue is that snapd gets confused by the presence of this line in the /proc/<pid-of-snap>/cgroup file:

1:name=systemd:/

This line tells snapd that the snap process has been assigned the / path in the cgroup V1 hierarchy. This does not match the path that snapd was expecting, so it starts making wrong assumptions, and ultimately freezing the snap process itself.

The point is that the cgroup V1 hierarchy should not be there, because in Ubuntu we nowadays use the V2 hierarchy only. And what is more confusing (and what initially set me completely off track) is that, according to the “mount | grep cgroup” output you pasted above, the cgroup V1 hierarchy is indeed not mounted in your system!

So, why is systemd assigning one to our process? Well, the cgroup kernel documentation says:

When a cgroup filesystem is unmounted, if there are any child cgroups created below the top-level cgroup, that hierarchy will remain active even though unmounted; if there are no child cgroups then the hierarchy will be deactivated.

My guess therefore is that you have one program, probably executed already in the boot process, that mounts the cgroup V1 hierarchy and assigns some processes to it. The it (or some other process) unmounts the hierarchy, which however continues to live as it’s not empty. To find out what it is, you could run a

find /sys/fs/cgroup -name cgroup.procs

and for those paths belonging to the V1 hierarchy you should print these files and see what PIDs are still assigned to them. This might give you a hint on what are the processes involved in this.

But even if you find this out and remove the offending services, please try the snapd version I’ve sent you, as I believe it’s a nice fix to have anyway.

Confirmed. With that build hello-world works as expected. Chromium-browser can be installed and also works like a charm. Thank you very much!

Quick question. If I do find the cgroup V1 hierarchy culprit how can I revert to the stock snapd version? Should apt install --reinstall snapd do the trick or should I do something else?

No, you should use snap refresh snapd. But please let me know when you find the root cause for the V1 cgroup. :slight_smile:

I’m not sure I’ll be able to :frowning:

find /sys/fs/cgroup -name cgroup.procs shows me basically every process in my system. I’ve tried removing steam and all i386 dependencies but that doesn’t help, so I’m not sure how to proceed. There is nothing in the logs that would show me why the v1 was mounted or why it is still active :frowning:

@mardy I am also affected by this, it appears it was the same issue overall (no snap app was able to launch and the same “systemd could not associate process” error). Seems a way to trigger this, at least for me, is to run a container using systemd-nspawn.

The issue disappears after a reboot.

When can we expect the fix to be in the stable channel (i. e. getting it via apt upgrade)? I would prefer not running those custom builds if possible.

Thanks @crtxcr for this information! I’m trying to push this to go in ASAP, and it should be there in the next major release. We are still trying to understand the impact of this bug, to see if it needs to be pushed into a point release as well. The problem is that changes on the cgroup handling are always very delicate, so we need to do quite some solid testing before getting this out.

Do you remember how you ran systemd-nspawn when you reproduced this bug?

Do you remember how you ran systemd-nspawn when you reproduced this bug?

systemd-nspawn -M name -D dir with a couple of --bind=. No system directories in --bind. Apart from that, no other options.

The issue remains when the container is finished. While I previously believed all snap apps are affected, right now it does not affect affect firefox (with all instances closed) and drawio mysteriously, while it did so when I reported the issue. However, it affects chromium and krita. So it unfortunately may not be the most deterministic bug.

Thanks to @crtxcr I can confirm that the culprit is systemd-nspawn container. I have one enabled on boot. When I disable it then snaps are working fine with the stock version. After I start it anything that hasn’t been started before could not be started. Interestingly if something was started at least once before the container it can be started after the container with no problems. That is my .nspawn file:

Boot=yes
PrivateUsers=no

[Network]
VirtualEthernet=yes

[Files]
Bind=/media/data

Thanks both, this information is very useful. I have now a better understanding on how this can happen: most likely, the machine that you loaded with systemd-nspawn is configured to use cgroup V1, and since the kernel is shared with the host, the cgroup configuration was “leaked”.

Can you please tell me which machine you booted? Can I download it from somewhere?

Mine is and old Ubuntu 16.04.4