- run-me-again → retry?
- done-permanently → done?
Also, what's the implied meaning of a plain exit? done, or retry? Perhaps retry, to be on the safe side?
Another detail: we should clearly specify what's the protocol on that file descriptor, so that we may choose to send commands other than final statuses, such as logging. For that we'll need a clear record delimiter.
Then, how about having a repair tool in the $PATH, so that one may run repair retry, etc, instead of fiddling with the file descriptor? We may use the symlink trick pointing it back to snap-repair itself so that we don't have another moving part to worry about.
We should probably do that, but in addition to rather than instead of. In other words, running it every 4 hours via systemd is a good idea anyway, and it'd be good to have a run during boot which does not depend on systemd, so we can fix it if something more serious goes wrong.
This sounds more implicit and magic than the first proposal, which probably means more error prone as well.
Instead of having the architecture as part of the key, I suggest having it as a field that is a list, and doing the same with the model. If either is omitted it means all.
I don't understand this part. It sounds like the repair-id should be [0-9]+ alone?
We need to think this through a bit more. A broken system that manages to update to a newer core won't necessarily mean it's not broken anymore. The idea of booting into a given core is also unclear when our repair system is actually not living inside snapd proper.
The underlying idea of having a way to define a good base is good, though. We just need to understand a bit better how to convey this information. Also, the core snap is not relevant for arbitrary brands. The system we end up with should allow the base repair to be provided for the model brand too.
Yes, a new revision seems saner. One thing we need to design for is this: we're now saying repair assertions may run multiple times, and we're also saying that they may be updated. That means a single assertion may end up running multiple times, within different revisions. We need to enable people to see the content and logs of each of every revision that was ever run, because we don't want an ill-meaning entity to hide the fact an older revision of the assertion did something nasty and then got updated again, hiding the initial behavior.
Needs to address comment above.
Overall plan is going in a great direction, thanks!