Aborted snap refresh caused system boot failure

snowsky · February 2, 2022, 4:42pm

Hi,

It started with a pending daemon start and when a refresh command is issued, the change is stuck forever. We use a simple systemd daemon but the daemon cannot start successfully seen from journalctl.

user@testhost:~$ sudo snap changes
ID   Status  Spawn                   Ready               Summary
28   Doing   yesterday at 13:43 UTC  -                   Refresh snaps "core20", "test-snap"
30   Done    today at 16:18 UTC      today at 16:18 UTC  Refresh all snaps: no updates
31   Done    today at 16:19 UTC      today at 16:19 UTC  Refresh all snaps: no updates

When the above pending refresh was aborted, the reboot failed:

I think the cause is snapd will invalidate the old version of core20 snap but the new version of core20 was not downloaded successfully because of the above stuck issue.

Also a quick question, if the boot failed, is there a way to recover? Thanks

Regards,

ijohnson · February 2, 2022, 5:07pm

Does the issue fix itself if you just reboot again ?

The issue here seems to be that in the modeenv, the base snap is referenced as revision 1328, but that revision does not exist on the partition to mount it and continue with booting.

If you had an in-progress refresh change, then theoretically you shouldn’t have been able to start a new refresh change since it would be in conflict with the currently running one. In order to debug that more, we would need to see more from the state of snapd. Is the device encrypted?

snowsky · February 3, 2022, 6:51pm

Thanks @ijohnson for the prompt update. We tried a hard reboot but OS didn’t recover. TPM is being used but FDE is not enabled.

ijohnson · February 3, 2022, 7:10pm

Can you access the disk and copy /system-data/var/lib/snapd/state.json from the ubuntu-data partition to another system and then run:

snap debug state ./copied-state.json

and share the output here with us?

snowsky · February 4, 2022, 10:46pm

sorry, cannot get the file for the OS is re-installed.

ijohnson · February 7, 2022, 5:30pm

Okay, if this happens again, please get the state.json and any logs before reinstalling the device so we may debug the root cause of this further.

snowsky · February 7, 2022, 11:29pm

Will do

Another qq, is /system-data/ encrypted? Can I mount the disk to the other virtual machine to copy the file out?

Thanks

ijohnson · February 8, 2022, 3:02am

If FDE is not being used then system-data directory on the ubuntu-data partition is not encrypted. If FDE is used, then yes system-data (along with all other files) on the ubuntu-data partition is encrypted.