Delaying refreshes and registration (in particular for pre-seeded classic images)

pedronis · February 20, 2018, 12:31pm

This is a reprisal of the topic Lazy/fallback serial registration for classic.

We have already implemented naive lazy registration for classic systems without any snaps preinstalled, so that they don’t register with the store until snapd is actively used and snaps installed.

Now though:

snaps are useful and we are going to get more images with pre-seeded snaps
it might be unexpected to get an immediate refresh for for example cloud images that were built to be good to go
some useful/interesting feature will depend on the device being registered
it still feels appropriate to delay registering until snapd is actively engaged with the store

About 1. and 2.:

we always planned to have some way to delay refreshes (without changing the overall schedule), some kind of snap delay-refresh command
we could implement that functionality and leverage it internally to set up some default delay at first boot for pre-seeded classic images (or some subset of them)

About 4. and 3.:

the original plan was indeed to delay registration until the first store operation on classic, the complexity is that the actual store operations (except download and other auxiliary bits) are invoked synchronously (not inside a Change) very early for something like snap install etc, this was not the case at some point but the new placement is to give quick feedback in case of errors and also avoid creating fully pointless Changes. Possible plans look like:

execute key generation (which can be expensive) early in any case but stop there
add code to the layer that mediates store and getting device registration bits to trigger the rest of registration on-need (once triggered the usual retry mechanisms should also be activated)
the code in the mediation layer would need to wait up to a reasonable timeout for registration to complete or otherwise have the store code try to proceed without device info

One problem here is also how to give good feedback to the user if the reasonable timeout that we should use ends up being long.

A different plan would be to move back talking to the store to be inside changes and dedicated tasks, this is complex, and would need thinking to still preserve the early/immediate errors behavior we enjoy now.

pedronis · February 20, 2018, 12:34pm

cc @niemeyer, @noise

niemeyer · February 26, 2018, 8:22pm

For the deferring of refresh case itself, we might use something simple organized on top of normal configuration, similar to how we already handle the overall scheduling of refreshes. Strawman:

We first introduce support for having refresh.hold with a full timestamp containing the absolute timestamp before which the system must not refresh, as long as the last refresh happened within a maximum limit which matches the limit for normal refresh scheduling (60 days now, I think?).

Hold seems better than delay in this context because the latter is commonly used as a noun, which may be confused with “refresh right after this delay”, instead of “postpone for at least this delta”.

Once that works, we can trivially introduce a command that can take a delta, verify that it is within bounds, and then change the configuration setting as long as the time provided is after the currently deferred time. In other words, a defer command does not move the deadline earlier.

The command might look like:

$ snap refresh --hold=5h
Next refresh scheduled for today at 22:09.

$ snap refresh --time
timer: 00:00~24:00/4
last: 2018-02-26T14:30:00-03:00
hold: 2018-02-26T22:09:00-03:00
next: 2018-02-26T23:54:00-03:00

With all of that working, we can change the initial logic so that it presets the hold value to, say, 6h from first startup. We can tune that as we learn more about the use cases.

As for the registration case, whatever we do we need to preserve a good user experience. So we can’t take ages to respond, and we cannot just reject sane calls. The best scenario is we do the long tasks early (key generation, etc), and then do the final store registration on first use. Are there any gotchas here, though?

pedronis · February 27, 2018, 2:05pm

I’m not sure if this means we should check that “hold” will end up after refresh “next” or that you can move “hold” only forward?

Also what’s the command to reset/remove “hold” ?

niemeyer · February 27, 2018, 2:22pm

I meant to say the latter, that if we do --hold=2h and there was a prior hold of 5h which yielded an absolute time farther into the future than the 2h would yield, the prior 5h one should remain in place.

For removing, I guess we can do something like –hold=none?

pedronis · February 27, 2018, 3:37pm

Will start to work on at least holding refreshes (refresh.hold etc).

pedronis · March 7, 2018, 1:53pm

Proposed the basic logic for refresh holding:

https://github.com/snapcore/snapd/pull/4789

pedronis · March 19, 2018, 5:25pm

proposed PR to delay registration on classic until first store interaction:

https://github.com/snapcore/snapd/pull/4873

pedronis · April 16, 2018, 10:04am

holding refreshes a while on classic was in 2.32
holding registration beyond what is currently done on classic is not something we want to do at the moment