This might be a catalyst to get us to refactor the metadata endpoint (create a new endpoint or 2) to accept more context and handle all install, refresh, and refresh with revision cases. IMHO /details should only be used for snap info
@pedronis @noise unless Iâm missing something the new endpoints would avoid the client downloading a particular revision because the user explicitly requested it, only to realise itâs not possible to install it, right? In other words it would enable the error to be synchronous instead of asynchronous, which is a good thing, but could be seen as a perf tweak?
Or are there cases that should behave differently than what weâre going to get with the current endpoints, considering the client is going to be checking the rules before installing anything?
Some considerations:
- epoch numbers are always going to be non-negative
-
0*
is an invalid epoch expression and will throw an error - as will trying to use hex or octal (or roman numerals etc) for the epoch expression
- an empty list in the expanded version is also an error
- as is a list that isnât in ascending order
and three questions:
- itâs an error if
read
andwrite
do not share at least one value, right? - Is it an error if the epoch includes random other stuff? E.g.
epoch: does-not-rhyme-with: epic
- the lists are yaml numbers, so hex and such will pass â or should I force them to be decimal? itâll mean more work both in the unmarhsaler and in documenting the fact, but it means weâll never have to deal with developers making things harder for themselves by doing
orepoch: read: [0xC], write: [11],
epoch: read: - 07 - 08 - 09 - 10
A different kind of question: if I have a snap installed, to upgrade do I need to be able to read all the epochs the current snap writes?
-
Indeed as it would mean the snap is incompatible with itself.
-
I suggest ignoring unknowns and letting snapcraft catch such issues and report at build time. The problem otherwise is that thereâs no way to build in the future a snap that accepts more data without it being incompatible with older snapds. On the other hand, we can always force in the future to have a snap that doesnât install in the past via the assumes mechanism.
-
I wasnât so concerned about the format of ints used, but your second example is a bit of an eye-opener as it looks good and is wrong. As itâs cheap for us, might be worth forcing people to use plain ints.
-
No, thatâs the proposed rule number 1 above, and itâs also the reason why in your first question you bring up sharing at least one value and not all of them:
another question: should we accept read: 5
as shorthand for read: [5]
?
Doesnât seem worth it. We have a short syntax that should cover the majority of simple cases. That for example will most likely be simply epoch: 5
alone. Once people decide to skip it and go full syntax, we can ask for more verbosity and precision.
The obverse of this is that 010
in yaml would typically be 8, but if we define them as decimal numbers then itâs a 10, and this will surprise a different subset of developers.
If we disallow we need to reject rather than parsing them differently.
The new Epoch complex structure (two lists for read and write capabilities) will be used in the new APIs that are being discussed here.
In the old API (that is being used by current and previous snapd
s, weâll do the following:
-
keep the epoch field as a string both in the request sent from the client to the server, as in the server responses
-
the refreshes will only filter by Epoch by getting the latest revision with the same epoch that was indicated by the client (no âepoch evolutionâ using the old API)
For the record, the issue that motivates the design @facundobatista points out is that old snapds cannot handle rich epochs, and do not understand the epoch semantics at all, so the suggested behavior ensures an old snapd will remain inside its comfort zone, so to speak.
While defining the work required for epochs, we noted that epoch filtering introduces a new concept we havenât needed before. When resolving which revision should be returned for a snap refresh, we need to pick the next best in its epoch upgrade path. And for that, we need to go through the timeline of releases in the channel the client is tracking. But what is a channel timeline?
There are different possible interpretations for it, but letâs consider the one we are thinking of applying:
- While the channel is open and getting releases, the timeline seems clear: all revisions released to that channel over time.
- On the other hand, if a channel is closed and follows another, its history is combined with the history of that other channel (while the channel is closed); if the channel is reopened later, it stops sharing the other channelâs history since then, but it preserves as part of its timeline the releases of that other channel while it was following it.
Then, this means that a refresh could get as a result a revision that was released to a different channel in the past (when filtering the revisions on which we would be apply epoch filtering).
For example, given the following sequence of releases:
Release r1 to edge
Release r2 to beta
Release r3 to beta
Close edge
Release r4 to beta
Release r5 to beta
Release r6 to edge (ie. reopen edge)
Release r7 to beta
The timeline for edge at this point would be:
r1 -> r3 -> r4 -> r5 -> r6
(note how we are including releases to beta while edge was closed and following beta)
If we extend the example above introducing epochs:
read write
r1 [1] [1]
r2 [1] [1]
r3 [1,2] [1,2]
r4 [1,2] [1,2]
r5 [2] [2]
r6 [1] [1]
r7 [2,3] [2,3]
Assuming a client tracking the edge channel doing a refresh from revision r1, if we follow the described timeline, it would get r4 as result.
The timeline definition will decide how to resolve epoch filtering when one (or more) closed channels are involved and thatâs why we want to agree on what it means before getting to the implementation.
It feels like we are making this too complicated. I understand the desire to recreate history exactly as it happened, but donât feel it is necessary in this case.
The intent of epochs is to get you from point A to D by way of B & C if necessary, not to follow an explicit trail of A->B->C->D, if C was only released to a followed channel but the path A->B->D is valid we are OK*. the problem only arises if an epoch-compatible revision was never released in the given channel, but iâd argue thatâs an oddity in the publishing sequence.
Simply following the explicit list of revisions released to a channel (and ignoring following) feels a lot clearer to me from a publisher viewpoint and not a difference that an end user will ever see.
*To clarify in my above example Iâm implying we are on edge, A, B, & D were released to edge, C only released to beta, and edge was closed at some point while C was in beta.
@noise Thatâs not really what we are aiming for, at least not without it being more specific. We do care about the sequence and the history, including the intermediate bits. Thatâs what gives people the ability to control what is the exact release that will be visible and jumping through an epoch, and all intermediate epochs that were published into the channel should be respected as well.
There are indeed a lot of details above about epochs, and they remain relevant. The simplicity here canât break down the rules we established above and explained the rationale for.
With that said, Iâm out of context. So let me read @matiasb detailed coverage of the issue and try to understand what is the problem to be solved.
Right, sorry i wasnât clear - but I wasnât saying to ignore the epoch chain, just to ignore followed revisions from another channel while gathering the links.
@matiasb Ah, thatâs an interesting edge case indeed, and it creates a pretty convoluted picture if we account for the fact that we support multiple levels of follow ups.
The best possible solution would be to reflect exactly what a client would see in reality. In the example, the closing of edge while it was holding r1 is exactly equivalent to a client as if edge had observed r3 being released into it, and then r4 and r5. Then r6 gets back into edge, and that indeed forms the history you described:
r1 -> r3 -> r4 -> r5 -> r6
That means no matter when one got to follow the edge channel, theyâll observe exactly the same history. Good invariant to hold.
Thatâs sounds right. I expect this would make the implementation a bit more complex, but the pay off is that people wonât have to think and fix awkward situations by hand, which pays off.
While trying to figure out the best behaviour solution for release history across channels, we found that we need to better understand which behaviour we want when selecting the revision to return in the case of multiple revisions available, even for the same channel.
IOW, when multiple revisions are OK to be returned after applying the Epochs restriction, which one should the server choose?
There are three options:
- the most recently released revision with the epoch wins
- the most recently released revision with the epoch and the greatest other epoch wins
- the most recently released revision with the epoch and at least one greater epoch wins
The first option is the simplest one. Of all the revisions that are Epoch ok, letâs just get the latest one. This option is not only simple, but also in line with current âlatest is what will be returnedâ pre-epochs behaviour, so itâs easy to think about.
However, this one is unworkable. Letâs see this example: letâs say we have the following releases (assuming same epochs in read and write, for simplicity):
r1 [0, 1]
r2 [1, 2]
r3 [2, 3]
If we boot first time a device that has epoch in 0, it will refresh to r1, then to r2 and finally to r3. At some point we found that r1 has a bug, so we fix it and release the corrected code in r4 (of course with epoch=[0, 1]). If we now boot first time another device in epoch 0, it will refresh to r4, and will never go to r2 or r3, getting stuck.
The second option is more complex but avoids the problem just described. Itâs the currently defined behaviour (per talks we have had in this post in the past). The downside of this approach is that the revision selection is rigid. See this example, we have:
r1 [0, 1]
r2 [1, 2, 3]
r3 [1, 2]
When a device with epoch 0 needs to refresh it will go to r1, and then from r1 to r2 (as it has the greatest epoch!). No matter what revisions we release in the future, the only way for the device to upgrade to something different is to release a revision with a number greater than or equal to 3 in its epoch list.
The third option is slightly different, as a revision may be selected even if it doesnât have the greatest epoch, but as it has at least one greater it allows an upgrade path to be established. See the following example:
r1 [0, 1]
r2 [1, 2, 3]
r3 [1, 2]
r4 [0, 1]
In this case the device bootstrapped with epoch 0 will end up also in r2, but through a longer sequence of refreshes. It first goes to r4 (not r1, because r4 has a higher epoch (1) than the matching one (0), and itâs released later than r1). In the next refresh it will go to r3 (which beats r4 as it has one higher epoch than the matching one (1), and beats r2 (which also have higher epochs) because it was released later). And finally it will refresh to r2, because now itâs the only one with âexceedingâ epochs beyond the matching ones.
So, which option you think itâs the best? Thanks!!
Isnât this described in rule number 2 of the original rule set presented in Aug/2017 above?
- The store always offers the most recent snap able to read the highest epoch among those that may be installed (see rule 1 and 6).
This still seems fine for your example. It will install r4, which is the only one that may be installed given we still have data at epoch 0, then r2, which can now be installed and is able to read epoch 2, and then r3, which may now be installed and can read epoch 3.
That case as specified in the original ruleset has the flaw that itâs possible to make a mistake from which it isnât possible to sensibly recover. If I release a revision for [2, 3, 4] and later realise I need separate [2, 3] and [3, 4] revisions, the [2, 3] revision will never be picked as the ancient [2, 3, 4] revision will have precedence. We donât have other situations today where you can get yourself into a corner like that.
One can always unrelease the snap if it was an actual mistake, which takes them out of the corner. When it is not a mistake, though, the logic seems right: the [1, 2, 3] is indeed preferable to something that can only read [1, 2]. The former allows a revert (downgrade), while the latter doesnât. Itâs also just more intuitive⌠the data format rolls forward, and hop points in the middle will need to support more than one format on the way to the current tip that is higher up. The ones with more compatibility are better hop points than the ones with less compatibility. Both [1, 2, 3] and [2, 3], and even [3], look to me like more recent epochs than [1, 2], and thus a more natural target.