Summary of core + lxd refresh bugs discovered on 4th of April 2018

I wanted to keep track of the three bugs we identified today, when core and lxd are refreshed in one transaction:

  1. when snapd is restarting itself the configure hook of lxd may not be able to talk to it (trace: https://www.irccloud.com/pastebin/BIrJMfkH/ )
  2. when snapd restarts and lxd is inactive it will not be added to interface repository, won’t have any plugs or slots to connect to, won’t work in practice
  3. when old snapd starts the process (think: deb) and new one finishes the auto-connect task is missing

This script can be used to reproduce issues 1 and 2 easily:

#!/bin/sh
set -uxe
snap install core lxd
snap switch --stable core
snap switch --stable lxd
snap refresh
snap interface lxd-support
snap switch --edge core
snap switch --edge lxd
snap refresh
snap interface lxd-support

NOTE: I edited this post to renumber the issues for simplicitly

1 Like

I think we understand the issues as follows:

  1. the socket that snapctl talks to is shut down too soon, we need to let that socket live for the shutdown process
  2. we need to track the revision for which we have setup security profiles so we can load the right snaps into the interface repository on restart
  3. we need to inject the task for reconnection on snapd startup when we detect this condition [fixed in https://github.com/snapcore/snapd/pull/4981 ]

We are working on fixing all those issues and will update this thread with links to PRs

yes, the fix for this one should be something like:

  • be sure not to start hooks if we are restarting/shutting down
  • don’t close the snapctl socket until all already running hook tasks have reached a ready state

We also have a general separate issue that any snap run - for example hooks or starting services - on a snap could fail if done while the snap base is inactive, because snap run might need to consult the current symlink of the base snap.

We probably can address this using carefully conflict checks, and setting up wait dependencies properly in multi snaps single changes.

at the least along the do path for problem 2. (for undo without guessing we likely need more state) we have:

https://github.com/snapcore/snapd/pull/4991

for 3. we have landed:

https://github.com/snapcore/snapd/pull/4981

and

https://github.com/snapcore/snapd/pull/4988

1 Like

I’ve started collecting the additional state in this PR

https://github.com/snapcore/snapd/pull/4996

Fixes for (2) and (3) are now in 2.32.3 which is in beta now.

2 Likes

I have updated the github release page with some nice understandable change log: https://github.com/snapcore/snapd/releases/tag/2.32.3

sketched this for 1.:

https://github.com/snapcore/snapd/pull/5004/files

1 Like

To help in the future write tests for the scenario involved in 3., bad compatibility between new snapd expected state (also tasks) vs an old snapd from deb initiating its installation, I proposed this:

https://github.com/snapcore/snapd/pull/5014

1 Like

To start addressing part of this (at least auto-refresh/many snaps update), I created this: