Disabling automatic refresh for snap from store

Just to add my $0.02 here because it can be difficult to hold the line in situations like this… Even though it’s uncomfortable for us to have auto-updates enabled without an off switch, I believe it’s the right way to go, for servers and embedded devices in particular, or anything headless - it’s the only way we’ll keep them secure over time.

One additional concern, that might not have been covered explicitly above - is around data usage on cellular connections. Many connected devices are on cellular plans with 100MB/mth data allocation.

Having some way to opt out of a very frequent release cadence (especially for large snaps) would be important for these devices.

Maybe a new ‘critical’ track that developers would only release to if a serious issue was identified? e.g. to prevent DDoS

A lot of this sounds great. If possible could we get some more details on the health checks and carefully scheduled updates?

To add another case, this might fall under carefully scheduled updates, is it possible to ensure that a customer has all the same versions at a site? This might look like all devices signaling that they are ready for update before a simultaneous update. I could see a snap attempting to trigger them all by the core api for a refresh at the same time, but that wouldn’t ensure one did not update earlier.

Thank you for the serious response. I did edit my post several times when I realized how confrontational it sounded, and I apologize that it still sounded that way in its final form.

In order to help you understand where I’m coming from, let me explain the first thoughts I had when I found out that snaps auto-update. I use RocketChat for my small group of video content creators. It provides a well-organized communication system, and because it’s both open-source and self-hosted, we can be sure it’s completely private (as far as our security practices will allow.) I used to have RocketChat installed manually, and I would skim through the changelogs before I applied an update.

Here’s a very practical concern I had about switching RocketChat from manual to Snaps: if the snap will automatically pull in any update the developer publishes, then the developer could publish an update that adds a backdoor and allows outside access to my private chats. Of course, the chats on my server don’t store very sensitive information, but once again, I support privacy out of principle, not because I’m keeping secrets.

I understand that the RocketChat developers would be hard-pressed to find a reason to publish such an update, and that doing so would cause all sorts of other problems I’m not addressing. Additionally, I trust the RocketChat developers not to publish an update that adds a backdoor, but with free & open software, I’m not supposed to have to trust the developers, I’m supposed to be able to see things for myself, before I install it on my server/computer.

I know this is only a hypothetical scenario, but all security flaws are hypothetical before they’re exploited. With how overreaching the US government has become in recent years, it’s conceivable to me that they might, say, attempt to force the RocketChat developers to publish an updated Snap with a backdoor to infiltrate the RocketChat server of somebody under investigation. There’s also the possibility of desktop app Snaps being updated to access a computer’s camera, microphone, or filesystem, although from what I understand there may be some security features in Snaps that could prevent that from happening.

Ultimately, I switched my RocketChat server to using Snaps because I know they’re coming down the pipeline whether I want them to or not. (A few weeks later, I had a failed update take my server offline and manual intervention was required to get it working again.) But I’m not comfortable using Snaps on my personal desktop computer as long as it’s set up in a way where the developers give me a time limit to review their code before it’s installed to my system without my knowledge or consent. Does that make sense?

2 Likes

I don’t see this “exception” any different than other exception for critical systems (ie example of having snaps in a car or any kind of critical machinery or [in my case] breaking the network for a few seconds to minutes because of snap updates.) Some places this might be a danger of high cost (data or outage damage) or potentially other serious harm.
I think some of these places are just not (yet) ready for snap packages - or maybe just not for snap packages from the store as manual installs are not triggering any updates.

As a sidenote, trying to understand the concept of the tracks, so I can potentially make a difference between critical new snap’s or just minor (ie help text) fixes.
Anyone knows a snap which is a good example and has this implemented?

1 Like

So if you use the scheduling function to delay to the maximum that @niemeyer described above but get a notification about upcoming changes the day they show up in the store (including a date info when the actual update will be) would that help ?

@brendan.carroll Indeed. We’ll definitely be implementing the proposal above soon, and it will address that. Our current plan is to enable scheduling on one or more days of the week within the month (e.g. 2nd Tuesday, between 9am and 10am).

As an aside, note that even for large snaps the downloads are actually deltas on top of the local revision

@cratliff Of course. There are at least three different features that cover the detailed scheduling:

  • One or more precise time windows may be defined for the whole system to update under. Today that window is constrained to a daily basis, and we’re changing that so that it may be scheduled to one or more days of the week within the month, per the example above.
  • Soon we’ll also enable the administrator to explicitly defer a scheduled update so that it doesn’t happen in the next so many hours or days. There will be a limit to the postponing, but it’ll be non-trivial (probably over a month).
  • Then, snaps will also have a saying on whether it is a good time or not for an update to happen. There are many cases which can easily benefit from scheduled updates, but have strict windows in which they cannot happen. For example, Spotify is being used right now to listen to music, so don’t update it.

As for health checks, the snap will be able to define a hook that verifies whether the snap is working well after the update. This may hold arbitrary logic. If the health check fails, the snap will be automatically reverted. This same mechanism will also be used to implement canary rollouts, in which a controlled number of systems is updated and if a relevant number of them report a failure, the upgrade is stopped and the status is reported back to the snap publisher.

Yes, there will certainly be management platforms for controlling site-wide deployments, but that falls slightly outside of the scope of the core snapd project itself.

@jacobgkau There are a number of valid concerns about auto-updates that we’re trying to address, but I don’t see how this particular problem is made any worse with auto-updates or how we might even do much about at all, in the sense that there will always be an element of trust on the publisher of the snap no matter what we do. In other words, whether you manually update or automatically update a snap, for using the snap at all you have to trust that the developer is not simply shipping your private conversations to arbitrary third-parties for example, in the case of a chat service.

With or without auto-update the publisher might add such a backdoor on the next update, and that backdoor will remain there for as long as that update is installed on your system, which means at least 3 months to review the backdoor assuming the auto-update scheme above since we keep three revisions around, and you may always just copy it out onto a separate space to review in the future.

So the situation is really not that different from a manual update.

@mwinter Yeah, some of the features above are aimed at such scenarios.

Right, that’s indeed the purpose of tracks. There are more details in this topic.

As for examples, I suggest having a look at the requests in the store category.

1 Like

@ogra Sure, but think about this. Here’s how things used to work:

  • Developer publishes an update
  • I update when I’m comfortable updating

Here’s how you’re proposing I work instead:

  • My computer will install all updates automatically by default
  • I set a “schedule” delaying updates for the arbitrary maximum amount of time
  • Developer publishes an update
  • I have a time limit for how long I have to get comfortable updating
  • When I am comfortable updating, I update manually, taking away the point of the auto-updater

This is making things so much more complicated for me when my current non-snap system does exactly what I want it to do, exactly when I want it to do it, by default. What you’re suggesting is a workaround to a problem that Snaps have gone out of their way to create.

@niemeyer The difference is, with auto-update the update is being installed on my system automatically, and without auto-update I am choosing to install that update on my system. You’re taking away my element of choice.

If I install a bad update on my system, that’s on me. If my computer auto-installs a bad update, that’s on the auto-update system (or on me for using the auto-update system.) Is the end result the same? Yes. But I was given the chance to avoid it in one case, and I wasn’t given any options in the other.

Again, if I’m using a “scheduling” feature to delay auto-updates to their maximum length, but I’m obviously going to want to update sooner than that in practice, the entire auto-update feature is adding work for me while not affecting my final update schedule at all… so why doesn’t the system just allow me to turn off auto-updates?

(Answer: because developers don’t trust normal users to update their computers if they have the option not to, even if auto-update is on by default. Where’s the trust there?)

2 Likes
Original response folded for not contributing much to the topic.

I was specifically responding to the actual point you made regarding trust on the snap being published.

I’m glad to hear the automatic update won’t in fact be a problem for you in practice. This is the key thing we want to ensure.

It’s going to be a problem for me in practice because in practice, I’m not going to use it.

Did you miss the part where I said, “the entire auto-update feature is adding work for me?” You’re talking circles around me at this point, and I can see I’m not going to get any further here. I hope you become less stubborn in the future, but for now, thank you for taking the time to talk to me as long as you have.

3 Likes

I’ve posted the proposed syntax for within-the-month refresh windows as an answer to this existing topic:

Not trying to be confrontational but technically yes that does seem to be what snappy is doing at the moment…

Pretty sure the above quote proves it :stuck_out_tongue: snappy won’t allow disabling updates because snappy worries that people just won’t update - it doesn’t trust its users. Phrasing it as Jacob has is confrontational but correct.

I can see the benefit of taking this decision in the medium-term though, it provides an incentive to produce features that will encourage developers not to use the kill switch if it is eventually introduced (similar to Niemeyer’s decision to block improving snapd-xdg-open packages so as to produce an incentive to get that integrated into snapd itself). Sorry for the notifications spam BTW, but you write interesting stuff!

This view is way too narrow … what snap tries to do is to take away any need for having to care about updates by improving the package environment in a way that software can take care of it itself, can do self tests, can automatically roll back to the last working version.

The only way to achieve security of software (to protect it from mirai attacks or from infection by encryption trojans) is to keep it with the least amount of vulnerabilities by keeping it updated in the timeliest possible manner. The only factor that breaks this principle is human intervention.

Let’s take a look at a typical sysadmin today. When an upgrade comes in he will hold it back and run a bunch of tests … if these tests succeed he will check for the best time to apply the update to all users and roll it out … or perhaps he is doing staged updates and gives it to a small subset of his users first …

Now imagine the package management actually offers to have these tests included in the package by upstream, so admins can submit their use case tests to be shipped and run automatically (including auto-rollback) by the package management… it also has a scheduling feature and rollout control …

What snappy tries to do is not take away trust from anyone, but improve and encourage automation by providing an easy environment for it. If there is any trust shifted around then it is actually shifted towards developers and their ability to ship good tests.

The final target is completely self-maintained machines through automation, the developers working on this (including the ones from the community i met) are way to excited about the technical aspects and possibilities of this to actually think about trust or dis-trust or any other political topics :wink:

4 Likes

In the case of an IoT device, would an application snap be able to use that same mechanism to schedule updates of the core or kernel snap if those updates would interrupt operation? I haven’t read as much about the update of those snaps or how their updates interacts with the system. It would be good to have more information on.

Indeed they would. Both the scheduling and the deferring would work for all snaps, including kernel, gadget, core, and any other application pre-seeded into the device as well.

1 Like

It’s certainly very exciting stuff! :smiley:

I wanted to echo the concern that other power users have about needing control over the update process that goes far beyond just deferring updates for N hours. As a real-world example, I manage some devices that run Ubuntu Core and are deployed throughout a city. Thankfully, we disabled the automatic refresh timer before deploying them because just today we learned that some revision of the pc-kernel snap between 45 and 68 introduced a change that breaks our device’s functionality. This time, it seems to be an obscure issue with running hostapd on a particular ath10k device. These kinds of stability issues are not all that uncommon in the realm of wireless hardware. We had another case where an automatic update to the core snap changed how confinement works and broke our application. Let us forget about the specifics and think about the implications for systems in a production environment. I think it is expecting too much if we think snap authors are going to write tests that anticipate all possible failure cases, especially when snaps from different organizations interact with each other. Have you considered including hooks such that after any snap (e.g. pc-kernel) is updated, any other snap (e.g. a user-installed snap) can run tests and potentially block the update?

We are looking to deploy our devices in cities all around the U.S., and I am faced with the painful decision of either disabling automatic updates entirely or waiting nervously for the day that an update to core or pc-kernel breaks them all. What would you do?

5 Likes

Quoting @niemeyer from above (nobody ever talked about “hours” in this thread)

No update of any of the official packages ever goes into stable without a testing period in the beta and/or candidate channels (there is a whole QA team working on that). The simple solution is to have a device (or a few) that you monitor via software, that are on the beta channel and that notify you when something breaks … You could go as far as having your monitoring tool automatically delay the upgrades of your stable devices if your automated function-tests fail.

Additionally report a bug about the found regression so that the release into stable will be held back … (this is indeed automatable as well … )

Alternatively to that you can surely have Canonicals QA team do the above checks on your hardware directly as part of a release test as a paid-for commercial option (at least: if we dont offer such a service yet, it is about time we do :wink: )

1 Like

You are absolutely right. I think I saw another thread that used the wording “N hours” and confused them in my mind. However, I do think think some use cases require mechanisms other than deferring, whether you cap it at 24 hours or two months.

Thank you for the great suggestion, and we will definitely try that out going forward. As for our current situation, it would really be ideal to have a mechanism to lock a snap (pc-kernel) on a known working version until we can verify for ourselves that the regression has been resolved and unlock it. Actually, we do have such a mechanism, but it feels very much like we are trying to work around snappy. On our deployed systems, we disable automatic updates globally and use the snapd API to refresh snaps individually to known working versions. What can I do? It is my job to make sure our deployed systems stay in working condition. I cannot expect Canonical’s QA team to defer releasing essential software updates (core, pc-kernel, etc.) to the world indefinitely just because some funny guy on the Internet (me) says there was a regression on his particular hardware platform.

By the way, I do think the private snap store / brand store is a viable solution for our problem. I think it is a relatively new offering because I cannot find much information about it, including pricing.

1 Like

You definitely can, the Canonical QA team is exactly interested in avoiding any kind of regressions and will happily hold back an update to stable (unless it is a serious security fix, but then they will consult the security team and the reporter of the issue about it). If we offer a kernel snap that is supposed to support your HW we definitely do never ever want it to release with regressions …

OTOH there is only a limited set of hardware to test on and feedback from funny guys like you :slight_smile: who use some hardware setup not included in the current test process is essential … i’m sure @fgimenez (as an important person in our QA and release process) agrees with that.

1 Like

Doubling down on what @ogra said, we did hold back updates before for this exact reason. This was part of why 2.25 never made it into stable, for example. So if you have serious breakages, by all means please report them and we’ll hold the update back.

1 Like