It can be difficult to debug issues that might happen from installing or refreshing a snap where one of the daemons fails as part of the install/refresh. This is because the installation is always aborted, so for example if the root cause of the issue is with some corrupted data in $SNAP_DATA, etc. the logs from the system journal during installation may not be helpful to debug the issue.
It would be nice if there was a development flag that folks could use to essentially have snapd ignore the daemons that failed to start and still continue with the installation anyways. Currently my workflow for this kind of debugging if I don’t have access to the snapcraft recipe that built the snap would be:
- snap download the snap
- unsquashfs the snap
- modify the install hook to
snapctl stop --disable $SNAP_NAME.failing-service
- Re-pack the snap with
snap pack
- Install the new snap with
--dangerous
Note that this still is not perfect though because we no longer have any auto-connections, etc. from the store assertion that may have been granted to the snap, since we modified the snap contents, and then would probably need to continue with:
- re-connect all snap interfaces with
snap connect
Then finally we are ready to run the failing service with and do actual debugging with
snap start $SNAP_NAME.failing-service
Obviously if you are the snap author you have the snapcraft recipe and you could re-build it with the install hook that disables all services and push it up to the store in a branch (thus getting all the auto-connections you need), but still that introduces a lot of overhead to debug an issue like this.
Ideally I think the flow for debugging this would look like
- Install the snap such that failed daemons are ignored with
snap install --ignore-failed-daemons $SNAP_NAME
- do your debugging with $SNAP_DATA and
snap start
Also note that this problem doesn’t really happen if you declare daemon: simple
because then systemd doesn’t really perform a check to see if the daemon started successfully. If you using any of the other daemon
settings though you are liable to have this problem.
Thoughts?
For an easy example of a failing snap, see my nope snap:
$ snap install nope
error: cannot perform the following tasks:
- Start snap "nope" (2) services ([start snap.nope.nope.service] failed with exit status 1: Job for snap.nope.nope.service failed because the control process exited with error code.
See "systemctl status snap.nope.nope.service" and "journalctl -xe" for details.
)