So if you use the scheduling function to delay to the maximum that @niemeyer described above but get a notification about upcoming changes the day they show up in the store (including a date info when the actual update will be) would that help ?
@brendan.carroll Indeed. We’ll definitely be implementing the proposal above soon, and it will address that. Our current plan is to enable scheduling on one or more days of the week within the month (e.g. 2nd Tuesday, between 9am and 10am).
As an aside, note that even for large snaps the downloads are actually deltas on top of the local revision
@cratliff Of course. There are at least three different features that cover the detailed scheduling:
- One or more precise time windows may be defined for the whole system to update under. Today that window is constrained to a daily basis, and we’re changing that so that it may be scheduled to one or more days of the week within the month, per the example above.
- Soon we’ll also enable the administrator to explicitly defer a scheduled update so that it doesn’t happen in the next so many hours or days. There will be a limit to the postponing, but it’ll be non-trivial (probably over a month).
- Then, snaps will also have a saying on whether it is a good time or not for an update to happen. There are many cases which can easily benefit from scheduled updates, but have strict windows in which they cannot happen. For example, Spotify is being used right now to listen to music, so don’t update it.
As for health checks, the snap will be able to define a hook that verifies whether the snap is working well after the update. This may hold arbitrary logic. If the health check fails, the snap will be automatically reverted. This same mechanism will also be used to implement canary rollouts, in which a controlled number of systems is updated and if a relevant number of them report a failure, the upgrade is stopped and the status is reported back to the snap publisher.
Yes, there will certainly be management platforms for controlling site-wide deployments, but that falls slightly outside of the scope of the core snapd project itself.
@jacobgkau There are a number of valid concerns about auto-updates that we’re trying to address, but I don’t see how this particular problem is made any worse with auto-updates or how we might even do much about at all, in the sense that there will always be an element of trust on the publisher of the snap no matter what we do. In other words, whether you manually update or automatically update a snap, for using the snap at all you have to trust that the developer is not simply shipping your private conversations to arbitrary third-parties for example, in the case of a chat service.
With or without auto-update the publisher might add such a backdoor on the next update, and that backdoor will remain there for as long as that update is installed on your system, which means at least 3 months to review the backdoor assuming the auto-update scheme above since we keep three revisions around, and you may always just copy it out onto a separate space to review in the future.
So the situation is really not that different from a manual update.
@mwinter Yeah, some of the features above are aimed at such scenarios.
Right, that’s indeed the purpose of tracks. There are more details in this topic.
As for examples, I suggest having a look at the requests in the store category.
@ogra Sure, but think about this. Here’s how things used to work:
- Developer publishes an update
- I update when I’m comfortable updating
Here’s how you’re proposing I work instead:
- My computer will install all updates automatically by default
- I set a “schedule” delaying updates for the arbitrary maximum amount of time
- Developer publishes an update
- I have a time limit for how long I have to get comfortable updating
- When I am comfortable updating, I update manually, taking away the point of the auto-updater
This is making things so much more complicated for me when my current non-snap system does exactly what I want it to do, exactly when I want it to do it, by default. What you’re suggesting is a workaround to a problem that Snaps have gone out of their way to create.
@niemeyer The difference is, with auto-update the update is being installed on my system automatically, and without auto-update I am choosing to install that update on my system. You’re taking away my element of choice.
If I install a bad update on my system, that’s on me. If my computer auto-installs a bad update, that’s on the auto-update system (or on me for using the auto-update system.) Is the end result the same? Yes. But I was given the chance to avoid it in one case, and I wasn’t given any options in the other.
Again, if I’m using a “scheduling” feature to delay auto-updates to their maximum length, but I’m obviously going to want to update sooner than that in practice, the entire auto-update feature is adding work for me while not affecting my final update schedule at all… so why doesn’t the system just allow me to turn off auto-updates?
(Answer: because developers don’t trust normal users to update their computers if they have the option not to, even if auto-update is on by default. Where’s the trust there?)
Original response folded for not contributing much to the topic.
I was specifically responding to the actual point you made regarding trust on the snap being published.
I’m glad to hear the automatic update won’t in fact be a problem for you in practice. This is the key thing we want to ensure.
It’s going to be a problem for me in practice because in practice, I’m not going to use it.
Did you miss the part where I said, “the entire auto-update feature is adding work for me?” You’re talking circles around me at this point, and I can see I’m not going to get any further here. I hope you become less stubborn in the future, but for now, thank you for taking the time to talk to me as long as you have.
I’ve posted the proposed syntax for within-the-month refresh windows as an answer to this existing topic:
Not trying to be confrontational but technically yes that does seem to be what snappy is doing at the moment…
Pretty sure the above quote proves it snappy won’t allow disabling updates because snappy worries that people just won’t update - it doesn’t trust its users. Phrasing it as Jacob has is confrontational but correct.
I can see the benefit of taking this decision in the medium-term though, it provides an incentive to produce features that will encourage developers not to use the kill switch if it is eventually introduced (similar to Niemeyer’s decision to block improving snapd-xdg-open packages so as to produce an incentive to get that integrated into snapd itself). Sorry for the notifications spam BTW, but you write interesting stuff!
This view is way too narrow … what snap tries to do is to take away any need for having to care about updates by improving the package environment in a way that software can take care of it itself, can do self tests, can automatically roll back to the last working version.
The only way to achieve security of software (to protect it from mirai attacks or from infection by encryption trojans) is to keep it with the least amount of vulnerabilities by keeping it updated in the timeliest possible manner. The only factor that breaks this principle is human intervention.
Let’s take a look at a typical sysadmin today. When an upgrade comes in he will hold it back and run a bunch of tests … if these tests succeed he will check for the best time to apply the update to all users and roll it out … or perhaps he is doing staged updates and gives it to a small subset of his users first …
Now imagine the package management actually offers to have these tests included in the package by upstream, so admins can submit their use case tests to be shipped and run automatically (including auto-rollback) by the package management… it also has a scheduling feature and rollout control …
What snappy tries to do is not take away trust from anyone, but improve and encourage automation by providing an easy environment for it. If there is any trust shifted around then it is actually shifted towards developers and their ability to ship good tests.
The final target is completely self-maintained machines through automation, the developers working on this (including the ones from the community i met) are way to excited about the technical aspects and possibilities of this to actually think about trust or dis-trust or any other political topics
Snapcrafters GitHub repo release process
In the case of an IoT device, would an application snap be able to use that same mechanism to schedule updates of the core or kernel snap if those updates would interrupt operation? I haven’t read as much about the update of those snaps or how their updates interacts with the system. It would be good to have more information on.
Weird udev_enumerate error
Indeed they would. Both the scheduling and the deferring would work for all snaps, including kernel, gadget, core, and any other application pre-seeded into the device as well.
It’s certainly very exciting stuff!
I wanted to echo the concern that other power users have about needing control over the update process that goes far beyond just deferring updates for N hours. As a real-world example, I manage some devices that run Ubuntu Core and are deployed throughout a city. Thankfully, we disabled the automatic refresh timer before deploying them because just today we learned that some revision of the pc-kernel snap between 45 and 68 introduced a change that breaks our device’s functionality. This time, it seems to be an obscure issue with running hostapd on a particular ath10k device. These kinds of stability issues are not all that uncommon in the realm of wireless hardware. We had another case where an automatic update to the core snap changed how confinement works and broke our application. Let us forget about the specifics and think about the implications for systems in a production environment. I think it is expecting too much if we think snap authors are going to write tests that anticipate all possible failure cases, especially when snaps from different organizations interact with each other. Have you considered including hooks such that after any snap (e.g. pc-kernel) is updated, any other snap (e.g. a user-installed snap) can run tests and potentially block the update?
We are looking to deploy our devices in cities all around the U.S., and I am faced with the painful decision of either disabling automatic updates entirely or waiting nervously for the day that an update to core or pc-kernel breaks them all. What would you do?
Quoting @niemeyer from above (nobody ever talked about “hours” in this thread)
No update of any of the official packages ever goes into stable without a testing period in the beta and/or candidate channels (there is a whole QA team working on that). The simple solution is to have a device (or a few) that you monitor via software, that are on the beta channel and that notify you when something breaks … You could go as far as having your monitoring tool automatically delay the upgrades of your stable devices if your automated function-tests fail.
Additionally report a bug about the found regression so that the release into stable will be held back … (this is indeed automatable as well … )
Alternatively to that you can surely have Canonicals QA team do the above checks on your hardware directly as part of a release test as a paid-for commercial option (at least: if we dont offer such a service yet, it is about time we do )
You are absolutely right. I think I saw another thread that used the wording “N hours” and confused them in my mind. However, I do think think some use cases require mechanisms other than deferring, whether you cap it at 24 hours or two months.
Thank you for the great suggestion, and we will definitely try that out going forward. As for our current situation, it would really be ideal to have a mechanism to lock a snap (pc-kernel) on a known working version until we can verify for ourselves that the regression has been resolved and unlock it. Actually, we do have such a mechanism, but it feels very much like we are trying to work around snappy. On our deployed systems, we disable automatic updates globally and use the snapd API to refresh snaps individually to known working versions. What can I do? It is my job to make sure our deployed systems stay in working condition. I cannot expect Canonical’s QA team to defer releasing essential software updates (core, pc-kernel, etc.) to the world indefinitely just because some funny guy on the Internet (me) says there was a regression on his particular hardware platform.
By the way, I do think the private snap store / brand store is a viable solution for our problem. I think it is a relatively new offering because I cannot find much information about it, including pricing.
You definitely can, the Canonical QA team is exactly interested in avoiding any kind of regressions and will happily hold back an update to stable (unless it is a serious security fix, but then they will consult the security team and the reporter of the issue about it). If we offer a kernel snap that is supposed to support your HW we definitely do never ever want it to release with regressions …
OTOH there is only a limited set of hardware to test on and feedback from funny guys like you who use some hardware setup not included in the current test process is essential … i’m sure @fgimenez (as an important person in our QA and release process) agrees with that.
Doubling down on what @ogra said, we did hold back updates before for this exact reason. This was part of why 2.25 never made it into stable, for example. So if you have serious breakages, by all means please report them and we’ll hold the update back.
I’d like to reiterate on that - this functionality is absolutely necessary. When decision-making is forcefully taken away and there is no ability to roll back on your own, people will eventually either do this or stop using the package format - if their job depends on it (they have SLAs with their clients) or if somebody’s life depends on some software to be up when it is told to stay up there won’t be much room for discussion.
There are classes of software that do not work well with automatic upgrades.
- One cannot just auto-upgrade a virtual machine process (QEMU) - sometimes security updates come out but you cannot kill them because they don’t exit on timeout;
- One cannot auto-upgrade an almost dead storage cluster or a database: what if my cluster is in a state where it takes one service failure to completely ruin/corrupt the whole cluster? You cannot solve certain issues before “11 PM”;
- Sometimes even patch versions are not considered safe to apply right away: https://www.rabbitmq.com/clustering.html#upgrading “This will generally not be the case when upgrading from one patch version to another (i.e. from 3.0.x to 3.0.y), except when indicated otherwise in the release notes; these versions can be mixed in a cluster. Therefore, it is strongly recommended to consult release notes before upgrading.”. What if vendor tests and QA miss that requirement from a third-party dependency? Who’s responsible in this case if your whole production cluster is down or corrupted in such a way that the issue fires in a month?
- RTOS kernels and software running on them. Cars have hypervisors nowadays. They also have mobile internet connection. Certain types of software may come up (like visual assistance, cruise control etc.) which will require certain safety guarantees. I would very much like a car to ask me if I’d like to upgrade before it does it by itself.
I am not against “upgrade by default” but in certain cases you have to offload the final decision to some other system. It may be a human, an AI, an automatic or automated system which will calculate whether it is safe to proceed based upon certain criteria. I hope examples above illustrate it good enough.
If anybody here is familiar with Control Theory, not having a hook to block upgrades is the same as removing the feedback loop from this diagram: https://en.wikipedia.org/wiki/Feedback#/media/File:Set-point_control.png
- how do I do Blue/Green or Red/Black deployments for patch versions if my software vendor does not provide tracks for patch versions?
- what if I don’t trust my software vendor by default and run everything through a staging environment as an enterprise?
- what if I am a casual user and I got a snap from some author that I don’t really trust? What if I later learn that he got hacked and attackers have silently pushed a backdoor to all systems where that snap is installed? What if I know about this right away but I have no access to my system to mitigate the issue before it’s too late?
To summarize: if this mechanism is to be generic for all kinds of software, it needs to provide more options to control snap distribution and upgrades at the snapd and snap package levels, otherwise it is not generic enough.
My personal advice is to reconsider adding such functionality at the snapd level. Brand store will be needed even if a snapd-level switch is introduced - I don’t think people generally invest time in such infrastructure unless they have unlimited resources and it’s a good idea to have that offered.
Let me add another scenario for you to be aware of and another possibility of resolution. About a month ago i added OSSEC (a HIDS) to a restricted environment I support. Shortly thereafter I got a notification that snapd couldn’t contact its server. The reason is that the standards for this particular environment require that outbound access be restricted to only business-required processes. Snapd was being blocked by the firewall because i didn’t even know it “called home” (I didn’t even know what snapd was until about two months prior to that). Environments like this (HIPPA, Finra?, PCI, Sarbanes-Oxley?, this European requirement, etc) require control. Any software update in the environment is a significant event. However, other requirements state that software must be kept current from a security perspective so it’s a double bind. Pragmatically speaking, even if i opened the firewall for snapd, the overhead of reviewing all the HIDS alerts I could get would be prohibitive. “Control” is the key operative word.
Another alternative to consider for addressing this would be a rating system for upgrades: “Do it or die” (critical security flaw) down to “It would be nice when/if you get around to it”. Software providers give the update a rating, consumers get to set the allowed level for automatic updates.
With all the above, I will say that you may have to come to a point where you say “Your needs and our solution are not sufficiently compatible, you need to forego using it”. As long as snapd doesn’t become the next systemd and users have a choice, that may be the fallback resolution of last resort.
Is that ‘some point’ soon or can you still introduce features to mitigate everyone’s concerns above?
You know, it would not be really hard for somebody to fork Snap and develop a patch for Snap which disables auto-updates. You would have to clone Snap’s code, apply patch, and re-build, but theoretically possible. Not advised though.