I wanted to keep track of the three bugs we identified today, when core and lxd are refreshed in one transaction:
when snapd is restarting itself the configure hook of lxd may not be able to talk to it (trace: https://www.irccloud.com/pastebin/BIrJMfkH/ )
when snapd restarts and lxd is inactive it will not be added to interface repository, won’t have any plugs or slots to connect to, won’t work in practice
when old snapd starts the process (think: deb) and new one finishes the auto-connect task is missing
This script can be used to reproduce issues 1 and 2 easily:
#!/bin/sh
set -uxe
snap install core lxd
snap switch --stable core
snap switch --stable lxd
snap refresh
snap interface lxd-support
snap switch --edge core
snap switch --edge lxd
snap refresh
snap interface lxd-support
NOTE: I edited this post to renumber the issues for simplicitly
1 Like
I think we understand the issues as follows:
the socket that snapctl talks to is shut down too soon, we need to let that socket live for the shutdown process
we need to track the revision for which we have setup security profiles so we can load the right snaps into the interface repository on restart
we need to inject the task for reconnection on snapd startup when we detect this condition [fixed in https://github.com/snapcore/snapd/pull/4981 ]
We are working on fixing all those issues and will update this thread with links to PRs
yes, the fix for this one should be something like:
be sure not to start hooks if we are restarting/shutting down
don’t close the snapctl socket until all already running hook tasks have reached a ready state
We also have a general separate issue that any snap run - for example hooks or starting services - on a snap could fail if done while the snap base is inactive, because snap run might need to consult the current symlink of the base snap.
We probably can address this using carefully conflict checks, and setting up wait dependencies properly in multi snaps single changes.
at the least along the do path for problem 2. (for undo without guessing we likely need more state) we have:
https://github.com/snapcore/snapd/pull/4991
for 3. we have landed:
https://github.com/snapcore/snapd/pull/4981
and
https://github.com/snapcore/snapd/pull/4988
1 Like
I’ve started collecting the additional state in this PR
https://github.com/snapcore/snapd/pull/4996
mvo
April 5, 2018, 10:53pm
7
Fixes for (2) and (3) are now in 2.32.3 which is in beta now.
2 Likes
I have updated the github release page with some nice understandable change log: https://github.com/snapcore/snapd/releases/tag/2.32.3
To help in the future write tests for the scenario involved in 3., bad compatibility between new snapd expected state (also tasks) vs an old snapd from deb initiating its installation, I proposed this:
https://github.com/snapcore/snapd/pull/5014
1 Like
To start addressing part of this (at least auto-refresh/many snaps update), I created this:
canonical:master
← pedronis:on-update-base-needs-to-be-active
opened 06:25PM - 09 Apr 18 UTC
This makes sure non-base snaps wait for core (source of snapctl etc) and possibl… y their base if those are being updated in the same multi-snap refresh change, otherwise the current link of core or the base could go away and break running hooks or services of the snap during the snap own refresh process.
The issue is more general than this, but this covers the auto-refresh case for example.
TODO: in general we should raise conflicts between changes where a snap is being operated on in one change, and their base or core is going through unlink-snap or unlink-current-snap in another.