There are a few threads that are a few years old, and this sound similar to some known Snapd bug (e.g. Bug #1891618 “Snapd stuck after a request timeout error” : Bugs : snapd ) but since I haven’t been able to pin down what exactly is going wrong, I’m just writing here the details that I’ve been able to gather thus far:
- This is an issue that appeared several times, but it’s unpredictable when it would appear. It only seems to appear after a few hours of uptime on my desktop system.
- This desktop system is not my work computer, and thus recently I’ve often been using it for stuff like the Slay The Spire videogame. Usually the first symptom I see is that the audio disappears. I suspect this means that Pulseaudio died.
- After a few seconds, the game UI freezes, and if I try to open any other kind of desktop application (btw, I’m using i3wm) it also hangs forever. This includes gnome-terminal
- I’m running Ubuntu 22.04, jammy (I haven’t yet updated to noble)
- To debug what’s going wrong, I can thankfully rely on xterm, instead of gnome-terminal
- Thanks to xterm, I can confirm the following:
- snapd is taking up 100% cpu:
top - 13:41:07 up 3:13, 2 users, load average: 1.09, 1.18, 1.29
Tasks: 334 total, 1 running, 330 sleeping, 0 stopped, 3 zombie
%Cpu(s): 12.7 us, 0.0 sy, 0.0 ni, 87.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 15877.6 total, 7529.4 free, 3068.0 used, 5280.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 9995.5 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1193 root 20 0 1765468 30120 19500 S 100.0 0.2 0:59.17 snapd
2342 dario 9 -11 1239672 32660 21512 S 0.3 0.2 5:25.05 pulseaudio
2362 dario 20 0 1580804 101252 63312 S 0.3 0.6 7:45.34 Xorg
2858 dario 20 0 1209012 221336 134924 S 0.3 1.4 7:41.16 steam
3562 dario 20 0 7072104 815316 216852 S 0.3 5.0 71:44.44 SlayTheSpire
19459 dario 20 0 15488 5368 4068 R 0.3 0.0 0:00.02 top
1 root 20 0 168048 13216 8232 D 0.0 0.1 0:02.69 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
- running
systemctl status snapd.socket
hangs and eventually fails with:Failed to get properties: Connection timed out
- There’s nothing useful in the snapd logs:
dario@feynman ~> journalctl --system -f -u snapd
May 11 01:58:51 feynman systemd[1]: Stopped Snap Daemon.
May 11 01:58:51 feynman systemd[1]: snapd.service: Consumed 3.673s CPU time.
-- Boot 4deb6ebd82c14bea90ec693a7be23e12 --
May 11 10:28:13 feynman systemd[1]: Starting Snap Daemon...
May 11 10:28:15 feynman snapd[1193]: overlord.go:271: Acquiring state lock file
May 11 10:28:15 feynman snapd[1193]: overlord.go:276: Acquired state lock file
May 11 10:28:15 feynman snapd[1193]: daemon.go:247: started snapd/2.62 (series 16; classic) ubuntu/22.04 (amd64) linux/5.15.0-101-generic.
May 11 10:28:15 feynman snapd[1193]: daemon.go:340: adjusting startup timeout by 1m25s (pessimistic estimate of 30s plus 5s per snap)
May 11 10:28:15 feynman snapd[1193]: backends.go:58: AppArmor status: apparmor is enabled and all features are available (using snapd provided apparmor_parser)
May 11 10:28:15 feynman systemd[1]: Started Snap Daemon.
May 11 10:33:16 feynman snapd[1193]: storehelpers.go:923: cannot refresh: snap has no updates available: "bare", "core18", "core20", "core22", "firefox", "gnome-3-38-2004", "gnome-42-2204", "gnome-system-monitor", "gtk-common-themes", "snap-store", "snapd"
^C⏎
dario@feynman ~ [130]> date
Sat 11 May 2024 14:06:01 BST
It’s not clear to me if there’s something else that is dying in the system, and this causes snapd to close its socket (and so snapd misbehaving is just a symptom), or if several parts of a modern Ubuntu system are depending on snapd and thus when snapd misbehaves, the whole of the system that depends on it (probably both pulseaudio and gnome-terminal depend on some kind of gsettings daemon / dbus channel… does this nowadays depend on snapd?). Either way, having snapd taking up 100% cpu while this is happening is not helpful, and at least distract from finding the root cause (so, I 'd argue that there’s also a snapd bug in here, but without having a clear reproduction case, I wouldn’t bother to file this in Launchpad).
After a few minutes (5 minutes or so?), the issue disappears, the system becomes responsive again, gnome-terminal finally drops me in my login shell, etc.