Cannot r emove snap `json: cannot unmarshal array into Go value of type map[string]*snapstate.refreshCandidate)`

I can’t remove lxd. I’d previously tried to refresh it, but it failed with the same reason.

➜ snap remove lxd --purge
error: cannot perform the following tasks:
- Remove snap "lxd" (21780) from the system (internal error: could not unmarshal state entry "refresh-candidates": json: cannot unmarshal array into Go value of type map[string]*snapstate.refreshCandidate)

Can you post the state.json somewhere? Please be warned, the state can contain information such as macaroons, so It’s best if you use a command like this:

sudo cat /var/lib/snapd/state.json |& jq 'del(.data.auth)'

and verify that the serial or macaroon do not appear in the output before posting.

cc @pstolowski, I think this may be of interest to you.

@joedborg have you been running snapd from edge after ~May (even just temporarily)?

The problem is caused by an internal change to “refresh-candidates” format; it was introduced as an array with https://github.com/snapcore/snapd/pull/10167 and changed to map with https://github.com/snapcore/snapd/pull/10182; these PRs landed almost one month apart (April/May) and would affect snapd from the edge channel.

I checked git tree of our releases starting from 2.49 and it looks like we only released the final state of this change (in 2.51), so stable releases shouldn’t be affected - unless of course snapd from edge was run at some point and created this entry in the “old” format, in which case stable snapd won’t be able to decode it.

I think this problem should correct itself with first auto-refresh, just make sure you’re not preventing autorefreshes (not holding them). The first successful auto-refresh would store refresh-candidates in the correct format, and then remove will not fail.

Let me know if you did use edge, and if auto-refresh corrected it (snap changes should report an automatic refresh at some point depending on your refresh schedule).

Hey @pstolowski,

It’s possible but I don’t think I have. Is there any way I can manually patch the state.json file to get this working?

Thanks, Joe

Have you checked if there was a recent auto-refresh (as I said I think it would correct this issue)?

Can you show the output of: $ snap list --all snapd

A manual fix would require removing of “refresh-candidates” entry from state.json, but it would be great to confirm what I wrote above.

➜ snap list --all snapd
Name   Version  Rev    Tracking       Publisher   Notes
snapd  2.52     13270  latest/stable  canonical✓  snapd,disabled
snapd  2.52.1   13640  latest/stable  canonical✓  snapd
➜ snap changes
ID   Status  Spawn                   Ready                   Summary
690  Done    yesterday at 11:56 EDT  yesterday at 11:56 EDT  Auto-refresh snap "openstackclients"
691  Done    yesterday at 18:41 EDT  yesterday at 18:41 EDT  Auto-refresh snap "openstackclients"
692  Done    today at 04:31 EDT      today at 04:31 EDT      Auto-refresh snap "openstackclients"
693  Done    today at 07:51 EDT      today at 07:51 EDT      Auto-refresh snap "openstackclients"
694  Error   today at 11:04 EDT      today at 11:04 EDT      Remove "lxd" snap

I don’t think so?

One more question, have you ever enabled the experimental gate-auto-refresh-hook feature?

Not that I know of :slight_smile:

Ok, I’m out of ideas and need to think about how did your snapd get into this situation, assuming you never used edge.

I looked some more into this and I’m very confused and out of ideas for explanation other than what I mentioned earlier… Setting of refresh-candidates has been guarded by experimental.gate-auto-refresh-hook feature flag. Having that feature ON was the only way of getting the “bad” entry into the state (plus, snapd from edge had to be used). The entry would get corrected by auto-refresh with newer snapd as long as that experimental feature is still enabled.

I think I’ll prepare a fix for handling the case where the feature was enabled for a short while on the “bad” version of snapd and then disabled, leaving the bad entry in the state.

UPDATE: later yesterday after looking at the state.json you provided I realized that the value of “refresh-candidates” confirms that at some point in the past you did use snapd from edge channel, since there was an entry for snapd from “latest/edge”. So I think this at least explains half of the mistery.

UPDATE#2: PR for this issue is up: https://github.com/snapcore/snapd/pull/10998

1 Like