That’s a good point, but it also seems like a more general issue. Any clients which are not affected by the problem will be downloading binaries that they won’t be making use of. Perhaps we should look into a more general solution that can better filter repairs at query time?
so far we don’t need much intelligence on the server (a hierarchy of dirs under apache would work for example), if we need real querying, we need to see how/when that can happen, it’s also more fragile from the experience withs snaps metadata.
an in between approach would be some sort of index file(s) on the server, we would need to think what’s the trust model for that
We did discuss being able to query by arbitrary headers. Was that local only, or has that conversation ever reached the server?
I also can imagine something specific for this particular case. We might introduce a simple language that would allow, for example, defining that a given repair should only be used if some safe expression matches on the local system. But that sounds like something for later.
first of all because these could be large (tens of MBs) there is some discussion whether they would be served through the general assertion service at all,
there is some querying functionality in the server but is not open in general, is used only internally between services, atm external clients can only get single assertions
@niemeyer after further thinking I agree that we should remove arch from the primary key,
we really want gapless sequences of repairs coming
- from us for all devices
- optionally one for each brand for all their devices
the repair-id “brand_model-#” idea was to allow for a 3rd set of sequences targeting exactly specific device models of a brand.
Whether for each gapless sequence we want to do
(getting though some not relevant ones)
or do queries? (but the current query capability we have even if opened doesn’t fit what we need here I think)
is a different matter;
notice that queries are tricky also because, we either get a stream with bodies in as well for many assertions, which is fragile to get and retry on,
or we could do a very simple query and get headers only first and then do GETs of the full assertions for the ones we really care about.
given that now repairs can tell whether to retry them or not, the issue of having a repair-id such that older than it repairs are not relevant, don’t need to run is less pressing,
though we still don’t want images at first boot to download all repairs that ever existed (though this also shouldn’t be a problem for a while), so we will need a way to have conservative starting points for this that comes with images (through main snaps (core, gadget…), config or some seed data)
btw, we had at some point the idea to have an expiry on repairs, for fixing stuff we agreed it doesn’t make sense, but for the debugging use case assertions it still might
afternoon walk thought, if we
- go through (download+execute+decide what to download next) repairs in a sequence one at a time
- or have an immediate flag on repairs that means execute me now before going further downloading from the sequence
we can probably postpone this problem until we understand its nature more, by at least implementing
- repair skip-to [–brand=BRAND] ID
- repair rewind [–brand=BRAND] ID
(strawman syntax), which might make sense anyway, basically giving ourselves the tools to control how to navigate the sequences from the repairs themselves (if needed)
this is the current thinking for the first implementation about:
How to retrieve each repair BRAND-ID/REPAIR-ID, HTTPS vs HTTP
Try to retrieve the headers only (as JSON) over HTTPS at:
filter whether it’s applicable or not (recording information and decision if not)
If applicable retrieve and verify the full repair (as application/x.ubuntu.assertion) also over HTTPS
When doing HTTPS use for verifying certificates a time given by the
max(sys-time, time-lower-bound) (at least in case we got an error about time validity of the cert (not valid yet)).
Where time-lower-bound is obtained by considering the max of:
- image creation time (timestamp of seed.yaml for example)
- server reported time of previous successful HTTPS requests
- timestamp of valid retrieved repairs
- possibly time lower bound as kept by snapd itself
If HTTPS still fails (in case of TLS-related reasons) try again from scratch retrieving the full repair over HTTP.
opened as well
skeleton of where the actual running could happen:
I have also have a in good shape WIP branch about the actual verification of the signature of repairs.
created also a PR with the state initialisation logic as discussed at the sprint and with @mvo :
also opened since:
check list from London:
primary key is brand and number scrips need to
repair done|retry, default status is retry
even if assertion was run and marked for retry, look for new rev when process starts, run local ones, then try fetching more #3935 “snap repairs” must show all revisions ever run root trusted key inside tool, brand key comes with assertions #3616 #3930
- only build snap-repair deb when really needed/wanted
- copy snap-repair to local disk, replace only when necessary after it reports one successful run
- replace only when digest changes
filter on series, arch, model #3787 make
repair.jsonbuilt out of seed/assertions w/ model #3571
Next phase work:
- repairs from USB stick
- early run (initramfs?)
as discussed we should send out a test repair to some subset of devices (likely once other wip things in our work queues have cleared up a bit)