Disabling automatic refresh for snap from store

i didnt mean to attack you, really :slight_smile:

i guess it is us who is doing a really bad job in advertising how to properly deal with snaps in such use cases (else this thread would probably not exist at all).

IMHO (and note that i was not involved in the decisions) the current behaviour is a good compromise to give you enough control while still making sure your install can not become harmful, because even though you are:

…us others are probably not that happy if your lxd cluster becomes part of the next botnet that DOSes our webservers, spreads the encryption trojan that makes us loose all our data if we dont pay some blackmailer etc :wink:

The internet is the biggest community project of mankind, each of us has a responsibility, most of us do not care though …

The behaviour of snaps is a little like the friendly policeman that regulary nudges you about that wide open weapon cabinet next to your wide open garage door so that your neighbor doesnt get shot with one of your guns by the theive that dropped by in your street.

It is annoying, no question, but you have control and it should be our job to teach people about how to exercise this control so it does not catch you by surprise and you can actually plan with it …

I am genuinely curious if someone more familiar with security research than I am could comment on this. It seems to be taken as a forgone conclusion that automatic updates result in better security. Of course, most of us have seen widely publicized stories of servers being attacked using known vulnerabilities that should have been patched months prior but for human error. Are there no cases of servers being attacked using vulnerabilities that were introduced through automatic updates? How do we know the latter case is less probable, or is that a hypothesis that snap aims to test? The pessimist in me assumes that software changes bring new bugs. I appreciate the contributions on all sides of this dialog.

1 Like

@lance. Yes, there is a trade-off. The bleeding-edge introduces bugs, which is the advantage to the Debian, etc. approach of back-porting security fixes into older, better-vetted versions. However, as some will point out, Debian, RHEL, etc. usually only back-port significant security fixes and minor ones will remain un-patched. Put another way, the answer is complex, but you are correct.

To some degree automatic updates offer a false sense of security. It is true that automatic updates increase security for users who never update. However, it is also true that automatic updates may decrease security for those who are more security minded. This whole chain poses some uses-cases. My problem is that I posted such a use-case and it was completely ignored. I decided to give it another shot and started a new thread with a reasonable solution: again no interest. (See Hook to run scripts before and after refresh) Had someone, anyone, shown an interest, I might believe that automatic updates in snaps are totally about security. Given the total lack of interest in cases where automatic updates harm security, I don’t buy the story. Sorry devs, but you need to listen a bit better or, at-least, fake some interest.

2 Likes

@niemeyer since, with the above quote, you effectively promised to address cases where the status quo is not working well, could you have a look at tony’s suggestions? If the snappy team is not able to address a lot of use-cases with the status quo (either because the status quo can’t address them or because the snappy team doesn’t have enough time) then I guess it’s time to introduce the global off switch? :slight_smile:

@Ads20000 Thanks for pushing.

In my view, the devs have things backwards. Some day, snapd (and deps) will be heavily vetted. The chance of a breakout will be slim, signature handling, etc. will be solid. Tampering detection will be trustworthy. The snap store will be trustworthy. When that happens, the need for a global off switch will be much, much, much less. However, that day is not today. A global off switch is needed in the beginning, when snapd (etc.) can’t/shouldn’t be trusted and should be monitored.

Listen devs, if I didn’t think snaps have potential, I (and others) wouldn’t waste my (our) time posting. We are trying to help. Meet us halfway.

3 Likes

That’s exactly what we’ve been doing for years now. The first releases of snapd could not block refreshes at all, period. Nowadays there are several mechanisms that allow postponing them in specific circumstances, from metered connections, to explicit holding, to delayed results at boot, health checks are coming, etc. We take this seriously and have been demonstrating that with actual development time.

On the other hand, I hope you can also meet half way, and realize that some of your assumptions might not be entirely valid. For more than a decade we’ve been responsible for a system that depends on people updating their software manually. We have a reasonable understanding of many of the involved issues, including the fact that once such a switch exists, the dynamic of the whole ecosystem changes, and it’s hard or impossible to go back to automated updates.

As for the topic you raised, these hooks already exist today, in your local snapd. They are called pre-refresh and post-refresh, and that’s documented.

I do apologize for not having read and responded timely to your message in the forum, though. For reasons which are both personal and professional I’ve got a backlog in the forum that I still need to go through, and it was unfortunate that nobody else did reply to your topic timely.

If you want to know more about many of the additional upcoming features, some of them related to automatic refreshes, we’ve just finished a sprint yesterday, and you can read the full notes here in the forum.

2 Likes

i bet there are … but i also bet there are a magnitude more attacks that are using well known and unfixed vulnerabilities …

typically a security update closes something that is more or less widely known and while a new feature that comes with an update will surely introduce new unknown bugs, they are exactly that … unknown and will hopefully be fixed with the next security update after they moved into the known state. automatic updates can not protect you from newly introduced bugs, but they can keep the window of being vulnerable by known security issues very small.

1 Like

@niemeyer Thanks for the response. In the thread that I started and linked above (Hook to run scripts before and after refresh), I responded. For summary for those reading here, I am asking for something different.

Here, I want to say thanks for the sprint info. It looks good. I want to strongly encourage the “Prevent refreshes while applications are in use” item. For long running jobs, think research and scientific computing, an errant automatic update could cause the loss of days of compute. I also mentioned this in one of my earlier posts (months ago).

1 Like

I agree, and that makes sense in the traditional package management ecosystem. @tony also brought up Debian and other distribution maintainers backporting security fixes, which is a hugely valuable service. However, it seems to me that snapd is forcing automatic updates not just for critical security fixes but for features and (sometimes) breaking changes, and that is surprising users as is evidenced by the posts here. What channel should users subscribe to if they want critical security fixes but do not need the latest features? Stable is not currently providing that functionality despite what the name may imply.

I think tracks may have been intended to fill that role, but I do not think many projects are using them consistently right now. Perhaps tracks just need a little more support in order to be widely adopted, e.g. allowing snap maintainers to create and manage tracks through snapcraft or snapcraft.io.

Do we know if the update that crashed @Syco’s containers was a security patch, or was it just an update for update’s sake? I think the answer itself is not so important, but the possibility of non-critical patches causing service disruption is problematic in my opinion.

1 Like

This problem stems from the fact that there is very high mental overhead in implementing snaps custom solutions instead of enabling/disabling updates per a custom process that fits some uncommon user’s needs. (I am assuming here that a common user will be well served by the existing defaults in place.) Why would I, as a snap user, need to spend time to learn a new update paradigm in order to use snaps?

Such a switch can be implemented by the user as has already been mentioned here. To speak for myself I am still posting here because I appreciate the work being done by the snap team. I want a solution that comes with the platform instead of fighting the platform in order to have my computer fill my needs. As a rule of a thumb, whenever you have to fight the platform in order to work for you, you should be looking for an alternative.

People not upgrading their software is a social problem not a technical one, you cannot solve a social problem with technical solutions.

5 Likes

If changes break the previous data format (a particular type of breaking change that may make it impossible to use older files with the snap - quite a serious issue!) then, when epochs are implemented, the application won’t be automatically upgraded to it, at least, that’s how I understand epochs work? @niemeyer can confirm. I don’t see why ‘breaking’ changes in terms of API or UI changes need to be held back (though they can be disruptive if someone is doing a mission critical task, they launch the app, and it takes them longer to navigate the UI/API, does the snappy team recognise this as a potential issue? Maybe the application author is allowed to specify a new epoch for this kind of breaking change too?)

This is risky because snap maintainers may not be using them correctly and people expect a consistent experience when using tracks… I think this is an element of control which snappy may be justifiably reluctant to give to developers, can you elaborate on why handing over control of these is justified? I mean, it should be possible perhaps for a track command in snapcraft to automatically create a new thread on the forum requesting a new track? You’re right that tracks help to resolve this problem, if a user specifically requests to avoid breaking changes, then assuming the tracks are used correctly and the project uses Semantic Versioning, tracking e.g. the 2.x track (probably just called ‘2’) will not automatically introduce breaking changes and you’d be able to manually choose when you want to switch to a more recent track.

Yes, the snappy team say that devs should be running automated tests etc to ensure that updates don’t cause these problems, but in case they do occur, those using snappy in mission-critical situations should be using the refresh timer to ensure that they can update at a time that they expect. Updates can be delayed for up to a month (I think?) by using that method. Sysadmins are effectively not permitted by snappy to delay their updates for longer, because snappy reckons that to do so is to endanger their users to possible security vulnerabilities etc (you can say that ‘well they should only get automatic security updates’ but often security/bugfix updates are not applied to old releases and a minor or major update is the only way to get the security update) and snappy believes that it has a role to protect its users, even against sysadmins’ wishes. I’d say that snappy forced updates are much better than Windows updates, from a user’s perspective, because, like most other GNU/Linux updates, they must more often don’t have to be applied when restarting the system, so you can actually use your computer whilst the updates are being installed!

Why would I, as a GNOME (albeit the Ubuntu modified session) user, need to spend time to learn a (somewhat) new desktop paradigm in order to use my desktop? Well, I do, because the devs chose to change the desktop paradigm and I’m fine with that! If I absolutely despise it, I can get on GNOME’s GitLab and IRC and fight for change (as you can do for snappy in this thread), if I want more leverage I could contribute to the project elsewhere and hope that, by meritocracy, I would get more of a say on this issue, or if I absolutely despise the paradigm change, I can use a different desktop environment, or package manager.

The snappy team reckons that a paradigm change is needed when it comes to updating software, can you prove to them that it is not? And since this (presumably, like Ubuntu) is a meritocracy, not a democracy, they’re not obliged to listen, though I think the team have done a sterling job of at least replying to criticism in a good-natured way, despite their workload :smiley:

Yes, if you absolutely despise snappy’s forced refreshes you should switch to Flatpak, AppImage, Nix, or traditional packaging :stuck_out_tongue:

That’s a neat quote, but is it true? Can you give an argument in support of this statement, or are you intending it to be a tautology (because it doesn’t seem to be, I’m not convinced those two things are mutually exclusive)? I guess it’s on the snappy team (primarily @niemeyer since this strategy is his idea), and supporters of the current approach of the team, to find an example where a technical solution has indeed solved a social problem. Perhaps undesirable work can be considered a social problem which automation could potentially resolve? So one can see how technology can in fact solve social problems, and your statement is not true? We only have to find one example to prove that your statement is false, and I think my example works!

Also, what will create change here, as ever, is actual use cases (like the LXD one - preferably with logs) that show why the current solution isn’t working, and the minimal possible changes to fix the use cases, short of introducing an off switch, if possible. If an off switch is the minimal solution, then it needs to be demonstrated why that is the minimal solution, why other apparent minimal solutions don’t work.

1 Like

And what is next revision has the same bug? And if that bug causes kernel panic, so reverting might be not that simple as SSH into host? This is exactly case for me with LXD, currently. revision 8774 used to work, but later ones - end up with kernel panic once lxd is started. I reverted to 8774, but I afraid next revision will end up with the same pain.

3 Likes

And an hour ago automatic update again crashed my host with kernel_panic :frowning:

1 Like

Can you open a topic and provide the output of snap version, what distro you use, and anything specific about your LXD setup? If it’s going down with a kernel panic then it’d be great to debug that further.

2 Likes
2 Likes

The snap team is smart enough to understand how valid that quote is and how valid it is not. The snap team is in control of the technical solutions and that’s why they are using them. Social solutions would be much harder for them to implement in order to achieve the same result. One could argue that the reason it would be much harder to force, through social means, users to upgrade so forcefully it is because that is not the correct approach.

And I am arguing that their paradigm change is misplaced. They can speak with UX designers about the problems of having two separate upgrade methods of updating software within one installation. I 'd be surprised if a professional UX designer would argue that installing skype via deb from Microsoft’s repo or via snap from the snapstore should matter in how skype updates for the end user. I 'm pretty sure Canonical employs people with professional UX experience. If I were to guess the reason they think, as you say, that a paradigm change is needed, is because the UX has been designed with IoT deployment use cases in mind instead of linux desktop user use cases.

What makes you think that I haven’t?

Just like you suggested that I get involved in Gitlab and IRC if I don’t like something about GNOME, I am involved here because I don’t like something about snaps but I do like the overall technology and appreciate the effort put into it. I hope that as the platform matures, the developers will care more about allowing desktop users to have as much control over snaps and their updates, as they have over which kernel they run and when they update it.

1 Like

On the initial question of a developer’s own snaps, the Chrome store implements a setting “max deploy percentage”, which if set to zero, means that no extensions gets updated automatically.

Providing an option like this for snap publishers can help… (They can set it to 100% for critical security updates and quite low for risky feature updates, in order to prevent all devices from breaking at once) (For manual refreshes an option (enabled by default?) can be provided to update anyway to the latest release in the channel or stick to the normal rules)

(Kernel and Core snaps causing unexpected reboots are a different issue. I do know that validation assertions are used for some devices to control which version gets rolled out… (It will actually downgrade if you manually install a newer version))

Actually that’s an idea, phased updates like what Ubuntu does (rolling out first to 20% or so users, then 40%, then 60%, 80%, 100%, reverting to 0% if automated tests fail) should probably be supported.

I had an idea for addressing the auto refresh issues to satisfy both the intent of snap for developers and the palpable risks that forced updates present to the install base. This post is about using the capabilities snap and containers like LXD provide us to pragmatically solve this real issue and hopefully satisfy all parties.

First a little background on our recent experience.

This past weekend LXD pushed 3.6 to stable. To put it simply, it was not stable. We recently made the change to utilize snap, as it is so recommended as the preferred package management for a large implementation of hosts with clean standard Ubuntu 18.04. A cluster of 30+ hosts providing several hundred containers, most of which are mission critical business systems for a customer user base of 20K, was crippled by this failed stable release. To the credit off LXD and Snap, the running containers stayed up despite the bug which caused LXD to fail restart after the refresh; however all functionality to snapshot, deploy new containers, and restart machines was down. In cloud environments where those are near or even absolute core functions needed for daily operation, that is a significant issue.

Again, to the credit of LXD developers, the fix was deployed to stable within about 20 hours of the initial bug report. To ignore the risks of snap auto-refresh in use cases such as LXD where the underlying service infrastructure can be crippled, is to ignore decades of best practices for development, operations and quality management. As a developer of enterprise critical systems, I fully understand and appreciate the Snapcraft “disruptor” approach to solve some longstanding development life-cycle challenges; but if being a “disruptor” actual disrupts the functionality of major infrastructure in the market enough times, the viability of snap will face significant challenges. In this first day, we already are facing to respond to many customers who want us to exit from our recent commitment to utilize snap. These include major brands, hospitals, and Fortune 500’s It is not hard to imagine how quickly this could spiral.

But my post is actually more about how to address this, as a believer in the benefits of snap despite being a “victim” of the inherent risks. The current option to delay updates does not address the issue, at least in this case from our experience and others posting regarding Snap and LXD, particularly due to the impacts on clustering. LXD itself though could be a viable tool for managing the challenge.

THE IDEA:
(If not a global solution, this or something like it absolutely needs to be implemented for hypervisor-type packages which have far reaching impacts on entire cloud platforms when their is a refresh failure.)

  1. Implement automatic “previous version” Tracks. Allow user to select to follow this “Penultimate Track”. Each version Stable release triggers creation of the Penultimate Track (previous version) which can remain for say 30 days or no more than 2 previous version Tracks. Perhaps this is the intent or best practice for Candidates and Tracks, but since the Candidate often seems to become “Stable” with little notice (I believe 1 day in this instance) it isn’t suitable; And Tracks are up to the developer and sometimes few and far between.

  2. Global option to turn off Snap without disabling the packages deployed. This sets the bar higher than it would by allowing disabling of refresh on individual snaps. A user would have to have a significant enough issue with a particular snap to take the step of disabling all together. By allowing this without the need to redeploy or in many cases build the app from source, longterm adoption of snap is not significantly impacted. If we end up having to move off of snap due to this latest issue, I can guarantee neither we nor our customers will support moving back to snap; but if we have a failsafe to protect the environment while issues of snap or the maintainer of a package are addressed, we wouldn’t have an issue re-enabling it.

  3. LXD based validation testing- We are already working on defining a script and process for using the latest LXD ability to convert a host into a Container. Our initial approach is to create a tool to create a LXD container of the current host state, spin it up, snap refresh, and validate it does not fail. We manually did it today and at least in this case it would have shown that their was an issue prior to killing our production environment. Long term, something like this would be incredible as part of snap, whereby a container was created with all the same settings/configs as the machine being refreshed and tested before forcing the refresh on the actual system…This cool tool, would only be useful though, if snap provides the required mechanisms to hold off on the forced refresh if validation fails.

  4. Rating system- Kind of a separate idea, just putting out there for discussion. A rating system based on bug reports or user feedback of failures from Snap refreshes would not only be informative to the users about the reliability of applications, but could also be used as a mechanism for enforcing rules requiring maintenance of prior release tracks as well as the timeline a developer is permitted to force auto refresh. For example if the maintainer has refresh failures indicated by ratings in the last 12 months, they must maintain the Prior Version Track for 90 days.

Perhaps some of this has been discussed or addressed before. As I mentioned we are recent snappers. I appreciate the discussion and any comments and hope this a useful contribution to the discourse. Ultimately we like many others will have to make a decision, sooner than we’d like, to mitigate the risks of auto refresh or be forced by the customers/users to abandon snap.

3 Likes

yes, this is something we will be looking into

2 Likes