Disabling automatic refresh for snap from store

backlog

#189

That’s exactly what we’ve been doing for years now. The first releases of snapd could not block refreshes at all, period. Nowadays there are several mechanisms that allow postponing them in specific circumstances, from metered connections, to explicit holding, to delayed results at boot, health checks are coming, etc. We take this seriously and have been demonstrating that with actual development time.

On the other hand, I hope you can also meet half way, and realize that some of your assumptions might not be entirely valid. For more than a decade we’ve been responsible for a system that depends on people updating their software manually. We have a reasonable understanding of many of the involved issues, including the fact that once such a switch exists, the dynamic of the whole ecosystem changes, and it’s hard or impossible to go back to automated updates.

As for the topic you raised, these hooks already exist today, in your local snapd. They are called pre-refresh and post-refresh, and that’s documented.

I do apologize for not having read and responded timely to your message in the forum, though. For reasons which are both personal and professional I’ve got a backlog in the forum that I still need to go through, and it was unfortunate that nobody else did reply to your topic timely.

If you want to know more about many of the additional upcoming features, some of them related to automatic refreshes, we’ve just finished a sprint yesterday, and you can read the full notes here in the forum.


#190

i bet there are … but i also bet there are a magnitude more attacks that are using well known and unfixed vulnerabilities …

typically a security update closes something that is more or less widely known and while a new feature that comes with an update will surely introduce new unknown bugs, they are exactly that … unknown and will hopefully be fixed with the next security update after they moved into the known state. automatic updates can not protect you from newly introduced bugs, but they can keep the window of being vulnerable by known security issues very small.


#191

@niemeyer Thanks for the response. In the thread that I started and linked above (Hook to run scripts before and after refresh), I responded. For summary for those reading here, I am asking for something different.

Here, I want to say thanks for the sprint info. It looks good. I want to strongly encourage the “Prevent refreshes while applications are in use” item. For long running jobs, think research and scientific computing, an errant automatic update could cause the loss of days of compute. I also mentioned this in one of my earlier posts (months ago).


#192

I agree, and that makes sense in the traditional package management ecosystem. @tony also brought up Debian and other distribution maintainers backporting security fixes, which is a hugely valuable service. However, it seems to me that snapd is forcing automatic updates not just for critical security fixes but for features and (sometimes) breaking changes, and that is surprising users as is evidenced by the posts here. What channel should users subscribe to if they want critical security fixes but do not need the latest features? Stable is not currently providing that functionality despite what the name may imply.

I think tracks may have been intended to fill that role, but I do not think many projects are using them consistently right now. Perhaps tracks just need a little more support in order to be widely adopted, e.g. allowing snap maintainers to create and manage tracks through snapcraft or snapcraft.io.

Do we know if the update that crashed @Syco’s containers was a security patch, or was it just an update for update’s sake? I think the answer itself is not so important, but the possibility of non-critical patches causing service disruption is problematic in my opinion.


#193

This problem stems from the fact that there is very high mental overhead in implementing snaps custom solutions instead of enabling/disabling updates per a custom process that fits some uncommon user’s needs. (I am assuming here that a common user will be well served by the existing defaults in place.) Why would I, as a snap user, need to spend time to learn a new update paradigm in order to use snaps?

Such a switch can be implemented by the user as has already been mentioned here. To speak for myself I am still posting here because I appreciate the work being done by the snap team. I want a solution that comes with the platform instead of fighting the platform in order to have my computer fill my needs. As a rule of a thumb, whenever you have to fight the platform in order to work for you, you should be looking for an alternative.

People not upgrading their software is a social problem not a technical one, you cannot solve a social problem with technical solutions.


#194

If changes break the previous data format (a particular type of breaking change that may make it impossible to use older files with the snap - quite a serious issue!) then, when epochs are implemented, the application won’t be automatically upgraded to it, at least, that’s how I understand epochs work? @niemeyer can confirm. I don’t see why ‘breaking’ changes in terms of API or UI changes need to be held back (though they can be disruptive if someone is doing a mission critical task, they launch the app, and it takes them longer to navigate the UI/API, does the snappy team recognise this as a potential issue? Maybe the application author is allowed to specify a new epoch for this kind of breaking change too?)

This is risky because snap maintainers may not be using them correctly and people expect a consistent experience when using tracks… I think this is an element of control which snappy may be justifiably reluctant to give to developers, can you elaborate on why handing over control of these is justified? I mean, it should be possible perhaps for a track command in snapcraft to automatically create a new thread on the forum requesting a new track? You’re right that tracks help to resolve this problem, if a user specifically requests to avoid breaking changes, then assuming the tracks are used correctly and the project uses Semantic Versioning, tracking e.g. the 2.x track (probably just called ‘2’) will not automatically introduce breaking changes and you’d be able to manually choose when you want to switch to a more recent track.

Yes, the snappy team say that devs should be running automated tests etc to ensure that updates don’t cause these problems, but in case they do occur, those using snappy in mission-critical situations should be using the refresh timer to ensure that they can update at a time that they expect. Updates can be delayed for up to a month (I think?) by using that method. Sysadmins are effectively not permitted by snappy to delay their updates for longer, because snappy reckons that to do so is to endanger their users to possible security vulnerabilities etc (you can say that ‘well they should only get automatic security updates’ but often security/bugfix updates are not applied to old releases and a minor or major update is the only way to get the security update) and snappy believes that it has a role to protect its users, even against sysadmins’ wishes. I’d say that snappy forced updates are much better than Windows updates, from a user’s perspective, because, like most other GNU/Linux updates, they must more often don’t have to be applied when restarting the system, so you can actually use your computer whilst the updates are being installed!

Why would I, as a GNOME (albeit the Ubuntu modified session) user, need to spend time to learn a (somewhat) new desktop paradigm in order to use my desktop? Well, I do, because the devs chose to change the desktop paradigm and I’m fine with that! If I absolutely despise it, I can get on GNOME’s GitLab and IRC and fight for change (as you can do for snappy in this thread), if I want more leverage I could contribute to the project elsewhere and hope that, by meritocracy, I would get more of a say on this issue, or if I absolutely despise the paradigm change, I can use a different desktop environment, or package manager.

The snappy team reckons that a paradigm change is needed when it comes to updating software, can you prove to them that it is not? And since this (presumably, like Ubuntu) is a meritocracy, not a democracy, they’re not obliged to listen, though I think the team have done a sterling job of at least replying to criticism in a good-natured way, despite their workload :smiley:

Yes, if you absolutely despise snappy’s forced refreshes you should switch to Flatpak, AppImage, Nix, or traditional packaging :stuck_out_tongue:

That’s a neat quote, but is it true? Can you give an argument in support of this statement, or are you intending it to be a tautology (because it doesn’t seem to be, I’m not convinced those two things are mutually exclusive)? I guess it’s on the snappy team (primarily @niemeyer since this strategy is his idea), and supporters of the current approach of the team, to find an example where a technical solution has indeed solved a social problem. Perhaps undesirable work can be considered a social problem which automation could potentially resolve? So one can see how technology can in fact solve social problems, and your statement is not true? We only have to find one example to prove that your statement is false, and I think my example works!

Also, what will create change here, as ever, is actual use cases (like the LXD one - preferably with logs) that show why the current solution isn’t working, and the minimal possible changes to fix the use cases, short of introducing an off switch, if possible. If an off switch is the minimal solution, then it needs to be demonstrated why that is the minimal solution, why other apparent minimal solutions don’t work.


"Large container deployments" usecase - where are we at?
#195

And what is next revision has the same bug? And if that bug causes kernel panic, so reverting might be not that simple as SSH into host? This is exactly case for me with LXD, currently. revision 8774 used to work, but later ones - end up with kernel panic once lxd is started. I reverted to 8774, but I afraid next revision will end up with the same pain.


#196

And an hour ago automatic update again crashed my host with kernel_panic :frowning:


#197

Can you open a topic and provide the output of snap version, what distro you use, and anything specific about your LXD setup? If it’s going down with a kernel panic then it’d be great to debug that further.


#198

#199

The snap team is smart enough to understand how valid that quote is and how valid it is not. The snap team is in control of the technical solutions and that’s why they are using them. Social solutions would be much harder for them to implement in order to achieve the same result. One could argue that the reason it would be much harder to force, through social means, users to upgrade so forcefully it is because that is not the correct approach.

And I am arguing that their paradigm change is misplaced. They can speak with UX designers about the problems of having two separate upgrade methods of updating software within one installation. I 'd be surprised if a professional UX designer would argue that installing skype via deb from Microsoft’s repo or via snap from the snapstore should matter in how skype updates for the end user. I 'm pretty sure Canonical employs people with professional UX experience. If I were to guess the reason they think, as you say, that a paradigm change is needed, is because the UX has been designed with IoT deployment use cases in mind instead of linux desktop user use cases.

What makes you think that I haven’t?

Just like you suggested that I get involved in Gitlab and IRC if I don’t like something about GNOME, I am involved here because I don’t like something about snaps but I do like the overall technology and appreciate the effort put into it. I hope that as the platform matures, the developers will care more about allowing desktop users to have as much control over snaps and their updates, as they have over which kernel they run and when they update it.


#200

On the initial question of a developer’s own snaps, the Chrome store implements a setting “max deploy percentage”, which if set to zero, means that no extensions gets updated automatically.

Providing an option like this for snap publishers can help… (They can set it to 100% for critical security updates and quite low for risky feature updates, in order to prevent all devices from breaking at once) (For manual refreshes an option (enabled by default?) can be provided to update anyway to the latest release in the channel or stick to the normal rules)

(Kernel and Core snaps causing unexpected reboots are a different issue. I do know that validation assertions are used for some devices to control which version gets rolled out… (It will actually downgrade if you manually install a newer version))


#201

Actually that’s an idea, phased updates like what Ubuntu does (rolling out first to 20% or so users, then 40%, then 60%, 80%, 100%, reverting to 0% if automated tests fail) should probably be supported.


#202

I had an idea for addressing the auto refresh issues to satisfy both the intent of snap for developers and the palpable risks that forced updates present to the install base. This post is about using the capabilities snap and containers like LXD provide us to pragmatically solve this real issue and hopefully satisfy all parties.

First a little background on our recent experience.

This past weekend LXD pushed 3.6 to stable. To put it simply, it was not stable. We recently made the change to utilize snap, as it is so recommended as the preferred package management for a large implementation of hosts with clean standard Ubuntu 18.04. A cluster of 30+ hosts providing several hundred containers, most of which are mission critical business systems for a customer user base of 20K, was crippled by this failed stable release. To the credit off LXD and Snap, the running containers stayed up despite the bug which caused LXD to fail restart after the refresh; however all functionality to snapshot, deploy new containers, and restart machines was down. In cloud environments where those are near or even absolute core functions needed for daily operation, that is a significant issue.

Again, to the credit of LXD developers, the fix was deployed to stable within about 20 hours of the initial bug report. To ignore the risks of snap auto-refresh in use cases such as LXD where the underlying service infrastructure can be crippled, is to ignore decades of best practices for development, operations and quality management. As a developer of enterprise critical systems, I fully understand and appreciate the Snapcraft “disruptor” approach to solve some longstanding development life-cycle challenges; but if being a “disruptor” actual disrupts the functionality of major infrastructure in the market enough times, the viability of snap will face significant challenges. In this first day, we already are facing to respond to many customers who want us to exit from our recent commitment to utilize snap. These include major brands, hospitals, and Fortune 500’s It is not hard to imagine how quickly this could spiral.

But my post is actually more about how to address this, as a believer in the benefits of snap despite being a “victim” of the inherent risks. The current option to delay updates does not address the issue, at least in this case from our experience and others posting regarding Snap and LXD, particularly due to the impacts on clustering. LXD itself though could be a viable tool for managing the challenge.

THE IDEA:
(If not a global solution, this or something like it absolutely needs to be implemented for hypervisor-type packages which have far reaching impacts on entire cloud platforms when their is a refresh failure.)

  1. Implement automatic “previous version” Tracks. Allow user to select to follow this “Penultimate Track”. Each version Stable release triggers creation of the Penultimate Track (previous version) which can remain for say 30 days or no more than 2 previous version Tracks. Perhaps this is the intent or best practice for Candidates and Tracks, but since the Candidate often seems to become “Stable” with little notice (I believe 1 day in this instance) it isn’t suitable; And Tracks are up to the developer and sometimes few and far between.

  2. Global option to turn off Snap without disabling the packages deployed. This sets the bar higher than it would by allowing disabling of refresh on individual snaps. A user would have to have a significant enough issue with a particular snap to take the step of disabling all together. By allowing this without the need to redeploy or in many cases build the app from source, longterm adoption of snap is not significantly impacted. If we end up having to move off of snap due to this latest issue, I can guarantee neither we nor our customers will support moving back to snap; but if we have a failsafe to protect the environment while issues of snap or the maintainer of a package are addressed, we wouldn’t have an issue re-enabling it.

  3. LXD based validation testing- We are already working on defining a script and process for using the latest LXD ability to convert a host into a Container. Our initial approach is to create a tool to create a LXD container of the current host state, spin it up, snap refresh, and validate it does not fail. We manually did it today and at least in this case it would have shown that their was an issue prior to killing our production environment. Long term, something like this would be incredible as part of snap, whereby a container was created with all the same settings/configs as the machine being refreshed and tested before forcing the refresh on the actual system…This cool tool, would only be useful though, if snap provides the required mechanisms to hold off on the forced refresh if validation fails.

  4. Rating system- Kind of a separate idea, just putting out there for discussion. A rating system based on bug reports or user feedback of failures from Snap refreshes would not only be informative to the users about the reliability of applications, but could also be used as a mechanism for enforcing rules requiring maintenance of prior release tracks as well as the timeline a developer is permitted to force auto refresh. For example if the maintainer has refresh failures indicated by ratings in the last 12 months, they must maintain the Prior Version Track for 90 days.

Perhaps some of this has been discussed or addressed before. As I mentioned we are recent snappers. I appreciate the discussion and any comments and hope this a useful contribution to the discourse. Ultimately we like many others will have to make a decision, sooner than we’d like, to mitigate the risks of auto refresh or be forced by the customers/users to abandon snap.


#203

yes, this is something we will be looking into


#204

I think there is another concern here too. Take software like Eclipse, Blender, Musescore, FreeCAD etc. It should be obvious that you would never want to update to another major version right in the middle of a project you are working on. But such a “project” can last many months or sometimes even years. So being able to merely postpone an upgrade doesn’t work. Saying that it wouldn’t be a problem because the snaps are presumed vetted etc. doesn’t work either. The user needs a switch to completely disable updates for a given packages until further notice.

Another approach that may work would be to have specific channels that offer say only security fixes or known bugfixes for a given release series, but no major upgrades, new features or incompatible changes. LXD for example does something along those lines, but it would need to be a systematic policy in the snap store.

Either way this is a problem that needs addressing IMHO.


#205

I suppose at the moment you should report a bugs for major software that you think need tracks but don’t yet at the moment. It’s up to them to get those tracks registered and start using them. Using tracks you won’t get major version updates.

Epochs would automatically prevent automatic refreshes to application versions that change the file format incompatibly, so once the feature is implemented and snaps make sure they give a breaking snap a new epoch then your updates will indeed be held back until you manually refresh them, as I understand it?


#206

That sounds like the kind of solution I was thinking of. I thought however that Epochs were already implemented in snap (even if nobody is using them)?

PS: by the same token this also kind of vindicates the idea that there should be an override switch anyway. Suppose that you rely on an application whose packagers don’t use this, or don’t do it correctly, or where a potential breakage results from an upstream bug that the packagers didn’t know of, etc. The user should be able to make sure that the software won’t change unpredictably under his/her feet and there should be an option to install updates at the user’s discretion only.

After all being able to implicitly trust that your computer obeys you and only you is a huge part of what FOSS is about, or is it not?


#207

According to both the topic and the roadmap it’s still ‘upcoming’.

Yes, the snappy devs seem to accept that there is a small probability that potential breakage occurs, they seem to be holding back implementing an off switch to provide motivation for developers to implement proper testing procedures so that there is as little breakage as possible and so that no-one ever needs to use the off switch (if/when it eventually is implemented).

So, they want to solve the problem where users don’t upgrade their software because it’s hassle and they end up getting hit by security vulnerabilities. I suppose the snappy developers are betting that users are going to underestimate the risk of security vulnerabilities damaging them and underestimate the benefits of updates and so they override the users’ desires on this.

You need more, preferably actual, use cases to prove to the snappy developers that this is necessary. At the moment they think that their mitigations cover most use cases and that they can grow features to accomodate others and that users should switch to alternative solutions (presumably Flatpak, AppImage, or traditional packaging) if snappy can’t grow the features to satisfy their use-case.

‘a huge part of what FOSS is about’ kind of, but for software to be free it merely needs to satisfy the four freedoms:

The freedom to run the program as you wish, for any purpose (freedom 0).
The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
The freedom to redistribute copies so you can help others (freedom 2).
The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

You could argue that not having an off switch for updates violates freedom 0, to run the program as you wish, for any purpose, but one can’t say that software not having all the options that you could possible wish for means that it violates freedom 0, so it’s a bit of a tenuous argument. Since the code is open, in any case, in principle you can go into the code and create the feature you want, which is not always possible with closed-source software.

I suspect snappy devs are more interested in the definition of open-source software than free software, however. The OSS definition does not include freedom 0, it seems.

Also, obviously, the store code is closed, which Niemeyer has addressed here.


#208

Another snap mechanism available immediately to address this would be tracks. With tracks, you install your software on say track 1 (which semantically only allow you to use version 1.X), and the developers of that software release version 2.X you don’t get automatically upgraded. Instead you will only get versions 1.1 and 1.2, etc. that the developers release on the 1 track. See a full explanation of how tracks work here.

The major and important difference between epochs and tracks is that epochs would allow automatic upgrades across epochs when they are compatible (i.e. when version 2.5 comes out and it can understand/use version 1.X data), whereas with appropriate usage of tracks this would never happen and you would be stuck on 1 track and using versions 1.X indefinitely until you as a user manually decide to switch from the 1.X track to the 2.X track, or to the stable track, etc.

See the PyCharm snap or the NextCloud snap as good examples of snaps using tracks. Using tracks is not required however, so if there is a particular application you care about that you want to track specific versions, then you should reach out to that developer and request they use tracks.

Epochs while not fully implemented are in progress on both the store and snapd side. For example see PR’s to snapd: 6142, 6172, 6179, and 6192.