Snapd 100% cpu, snapd socket connection timed out

There are a few threads that are a few years old, and this sound similar to some known Snapd bug (e.g. Bug #1891618 “Snapd stuck after a request timeout error” : Bugs : snapd ) but since I haven’t been able to pin down what exactly is going wrong, I’m just writing here the details that I’ve been able to gather thus far:

  • This is an issue that appeared several times, but it’s unpredictable when it would appear. It only seems to appear after a few hours of uptime on my desktop system.
  • This desktop system is not my work computer, and thus recently I’ve often been using it for stuff like the Slay The Spire videogame. Usually the first symptom I see is that the audio disappears. I suspect this means that Pulseaudio died.
  • After a few seconds, the game UI freezes, and if I try to open any other kind of desktop application (btw, I’m using i3wm) it also hangs forever. This includes gnome-terminal
  • I’m running Ubuntu 22.04, jammy (I haven’t yet updated to noble)
  • To debug what’s going wrong, I can thankfully rely on xterm, instead of gnome-terminal
  • Thanks to xterm, I can confirm the following:

  • snapd is taking up 100% cpu:
top - 13:41:07 up  3:13,  2 users,  load average: 1.09, 1.18, 1.29
Tasks: 334 total,   1 running, 330 sleeping,   0 stopped,   3 zombie
%Cpu(s): 12.7 us,  0.0 sy,  0.0 ni, 87.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15877.6 total,   7529.4 free,   3068.0 used,   5280.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   9995.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                              
   1193 root      20   0 1765468  30120  19500 S 100.0   0.2   0:59.17 snapd                                                                
   2342 dario      9 -11 1239672  32660  21512 S   0.3   0.2   5:25.05 pulseaudio                                                           
   2362 dario     20   0 1580804 101252  63312 S   0.3   0.6   7:45.34 Xorg                                                                 
   2858 dario     20   0 1209012 221336 134924 S   0.3   1.4   7:41.16 steam                                                                
   3562 dario     20   0 7072104 815316 216852 S   0.3   5.0  71:44.44 SlayTheSpire                                                         
  19459 dario     20   0   15488   5368   4068 R   0.3   0.0   0:00.02 top                                                                  
      1 root      20   0  168048  13216   8232 D   0.0   0.1   0:02.69 systemd                                                              
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.00 kthreadd 
  • running systemctl status snapd.socket hangs and eventually fails with: Failed to get properties: Connection timed out
  • There’s nothing useful in the snapd logs:
dario@feynman ~> journalctl --system -f -u snapd
May 11 01:58:51 feynman systemd[1]: Stopped Snap Daemon.
May 11 01:58:51 feynman systemd[1]: snapd.service: Consumed 3.673s CPU time.
-- Boot 4deb6ebd82c14bea90ec693a7be23e12 --
May 11 10:28:13 feynman systemd[1]: Starting Snap Daemon...
May 11 10:28:15 feynman snapd[1193]: overlord.go:271: Acquiring state lock file
May 11 10:28:15 feynman snapd[1193]: overlord.go:276: Acquired state lock file
May 11 10:28:15 feynman snapd[1193]: daemon.go:247: started snapd/2.62 (series 16; classic) ubuntu/22.04 (amd64) linux/5.15.0-101-generic.
May 11 10:28:15 feynman snapd[1193]: daemon.go:340: adjusting startup timeout by 1m25s (pessimistic estimate of 30s plus 5s per snap)
May 11 10:28:15 feynman snapd[1193]: backends.go:58: AppArmor status: apparmor is enabled and all features are available (using snapd provided apparmor_parser)
May 11 10:28:15 feynman systemd[1]: Started Snap Daemon.
May 11 10:33:16 feynman snapd[1193]: storehelpers.go:923: cannot refresh: snap has no updates available: "bare", "core18", "core20", "core22", "firefox", "gnome-3-38-2004", "gnome-42-2204", "gnome-system-monitor", "gtk-common-themes", "snap-store", "snapd"
^C⏎                                                                                                                                         
dario@feynman ~ [130]> date
Sat 11 May 2024 14:06:01 BST

It’s not clear to me if there’s something else that is dying in the system, and this causes snapd to close its socket (and so snapd misbehaving is just a symptom), or if several parts of a modern Ubuntu system are depending on snapd and thus when snapd misbehaves, the whole of the system that depends on it (probably both pulseaudio and gnome-terminal depend on some kind of gsettings daemon / dbus channel… does this nowadays depend on snapd?). Either way, having snapd taking up 100% cpu while this is happening is not helpful, and at least distract from finding the root cause (so, I 'd argue that there’s also a snapd bug in here, but without having a clear reproduction case, I wouldn’t bother to file this in Launchpad).

After a few minutes (5 minutes or so?), the issue disappears, the system becomes responsive again, gnome-terminal finally drops me in my login shell, etc.