Snapd hangs startup with an infinite loop of start failures and breaks all user-created symlinks

The snapd.service is hanging startup on Ununtu 20.04.3 with what appeared to be an infinite loop of startup failures but in fact was a 10-15 minute loop.

The message is, “Failed to start Snap Daemon.”, followed by a reference to snapd.service.

One finally is left with nothing but a stuck boot screen concluding with, “Reached target Graphical Interface.” Ctrl+alt+F2 blanks the screen and ctrl+alt+F3-12 have no effect, so no possibility exists to interact with the system.

Periodically, and before the last such failure, appeared “Failed to start Wait until snapd is fully seeded.”, with a reference to snapd.seeded.service.

Booting to stick, I was alarmed to see that every single symlink that I created with “systemctl enable” or ln (e.g. for /etc/resolv.conf), was broken, as was the link /etc/mtab. This cropped up two days ago and I have spent every waking hour since attempting to restore function, including a full system re-install, as described here. The bug herein described just bricked that installation, too.

Notably, the interim fix for the prior system brick was to boot to stick and delete all the broken links. This, however, entirely disabled snapd and each installed snap.

Pretty clearly, snapd needs to be reengineered to get its startup housekeeping done in the background after boot, or to get its dependencies established earlier, in parallel, or both, without holding up boot.

Concerning the more alarming behavior of link breaking, I have observed that inactive snap-related mount units disable units wanted by network-pre.target. See here. This is especially alarming because this target expressly exists to trigger firewalls, e.g., by the running of an iptables rules script. This is a security vulnerability for essentially the entire Linux/systemd universe because if network-pre.target isn’t reached, no firewall will exist.

I have read some transient comments that this may relate to un-mounting failures related to snap-relatd zfs mounts, which have especially complex, if not over-wrought, ordering conditions for startup. How those conditions unwind on shutdown may have unintended consequences. Also, my review of the relevant units disclosed no express unmounting commands that I could recognize. Furthermore, the more-ordinary snap-related mounts in /etc/systemd/system rely somewhat inartfully on the LazyUnmount directive instead of the more robust Conflicts= directive, or even Before=shutdown. Either case presents a risk vector for file corruption on hasty or disorderly shutdown, which would be consistent with the broken links and disabled snaps I have observed.