Summary of core + lxd refresh bugs discovered on 4th of April 2018

zyga-snapd · April 4, 2018, 6:46pm

I wanted to keep track of the three bugs we identified today, when core and lxd are refreshed in one transaction:

when snapd is restarting itself the configure hook of lxd may not be able to talk to it (trace: https://www.irccloud.com/pastebin/BIrJMfkH/ )
when snapd restarts and lxd is inactive it will not be added to interface repository, won’t have any plugs or slots to connect to, won’t work in practice
when old snapd starts the process (think: deb) and new one finishes the auto-connect task is missing

This script can be used to reproduce issues 1 and 2 easily:

#!/bin/sh
set -uxe
snap install core lxd
snap switch --stable core
snap switch --stable lxd
snap refresh
snap interface lxd-support
snap switch --edge core
snap switch --edge lxd
snap refresh
snap interface lxd-support

NOTE: I edited this post to renumber the issues for simplicitly

zyga-snapd · April 4, 2018, 7:23pm

I think we understand the issues as follows:

the socket that snapctl talks to is shut down too soon, we need to let that socket live for the shutdown process
we need to track the revision for which we have setup security profiles so we can load the right snaps into the interface repository on restart
we need to inject the task for reconnection on snapd startup when we detect this condition [fixed in https://github.com/snapcore/snapd/pull/4981 ]

We are working on fixing all those issues and will update this thread with links to PRs

pedronis · April 4, 2018, 7:36pm

yes, the fix for this one should be something like:

be sure not to start hooks if we are restarting/shutting down
don’t close the snapctl socket until all already running hook tasks have reached a ready state

pedronis · April 4, 2018, 7:58pm

We also have a general separate issue that any snap run - for example hooks or starting services - on a snap could fail if done while the snap base is inactive, because snap run might need to consult the current symlink of the base snap.

We probably can address this using carefully conflict checks, and setting up wait dependencies properly in multi snaps single changes.

pedronis · April 5, 2018, 7:44pm

at the least along the do path for problem 2. (for undo without guessing we likely need more state) we have:

https://github.com/snapcore/snapd/pull/4991

for 3. we have landed:

https://github.com/snapcore/snapd/pull/4981

and

https://github.com/snapcore/snapd/pull/4988

zyga-snapd · April 5, 2018, 9:40pm

I’ve started collecting the additional state in this PR

https://github.com/snapcore/snapd/pull/4996

mvo · April 5, 2018, 10:53pm

Fixes for (2) and (3) are now in 2.32.3 which is in beta now.

zyga-snapd · April 6, 2018, 6:24am

I have updated the github release page with some nice understandable change log: https://github.com/snapcore/snapd/releases/tag/2.32.3

pedronis · April 6, 2018, 6:40pm

sketched this for 1.:

https://github.com/snapcore/snapd/pull/5004/files

pedronis · April 9, 2018, 6:34pm

To help in the future write tests for the scenario involved in 3., bad compatibility between new snapd expected state (also tasks) vs an old snapd from deb initiating its installation, I proposed this:

https://github.com/snapcore/snapd/pull/5014

pedronis · April 9, 2018, 6:35pm

To start addressing part of this (at least auto-refresh/many snaps update), I created this:

github.com/canonical/snapd

overlord/snapstate: on multi-snap refresh make sure bases and core are finished before dependent snaps

canonical:master ← pedronis:on-update-base-needs-to-be-active

opened 06:25PM - 09 Apr 18 UTC

pedronis

+140 -0

This makes sure non-base snaps wait for core (source of snapctl etc) and possibl…y their base if those are being updated in the same multi-snap refresh change, otherwise the current link of core or the base could go away and break running hooks or services of the snap during the snap own refresh process. The issue is more general than this, but this covers the auto-refresh case for example. TODO: in general we should raise conflicts between changes where a snap is being operated on in one change, and their base or core is going through unlink-snap or unlink-current-snap in another.