Disabling automatic refresh for snap from store

teleflexnet · October 16, 2018, 5:36am

I had an idea for addressing the auto refresh issues to satisfy both the intent of snap for developers and the palpable risks that forced updates present to the install base. This post is about using the capabilities snap and containers like LXD provide us to pragmatically solve this real issue and hopefully satisfy all parties.

First a little background on our recent experience.

This past weekend LXD pushed 3.6 to stable. To put it simply, it was not stable. We recently made the change to utilize snap, as it is so recommended as the preferred package management for a large implementation of hosts with clean standard Ubuntu 18.04. A cluster of 30+ hosts providing several hundred containers, most of which are mission critical business systems for a customer user base of 20K, was crippled by this failed stable release. To the credit off LXD and Snap, the running containers stayed up despite the bug which caused LXD to fail restart after the refresh; however all functionality to snapshot, deploy new containers, and restart machines was down. In cloud environments where those are near or even absolute core functions needed for daily operation, that is a significant issue.

Again, to the credit of LXD developers, the fix was deployed to stable within about 20 hours of the initial bug report. To ignore the risks of snap auto-refresh in use cases such as LXD where the underlying service infrastructure can be crippled, is to ignore decades of best practices for development, operations and quality management. As a developer of enterprise critical systems, I fully understand and appreciate the Snapcraft “disruptor” approach to solve some longstanding development life-cycle challenges; but if being a “disruptor” actual disrupts the functionality of major infrastructure in the market enough times, the viability of snap will face significant challenges. In this first day, we already are facing to respond to many customers who want us to exit from our recent commitment to utilize snap. These include major brands, hospitals, and Fortune 500’s It is not hard to imagine how quickly this could spiral.

But my post is actually more about how to address this, as a believer in the benefits of snap despite being a “victim” of the inherent risks. The current option to delay updates does not address the issue, at least in this case from our experience and others posting regarding Snap and LXD, particularly due to the impacts on clustering. LXD itself though could be a viable tool for managing the challenge.

THE IDEA:
(If not a global solution, this or something like it absolutely needs to be implemented for hypervisor-type packages which have far reaching impacts on entire cloud platforms when their is a refresh failure.)

Implement automatic “previous version” Tracks. Allow user to select to follow this “Penultimate Track”. Each version Stable release triggers creation of the Penultimate Track (previous version) which can remain for say 30 days or no more than 2 previous version Tracks. Perhaps this is the intent or best practice for Candidates and Tracks, but since the Candidate often seems to become “Stable” with little notice (I believe 1 day in this instance) it isn’t suitable; And Tracks are up to the developer and sometimes few and far between.
Global option to turn off Snap without disabling the packages deployed. This sets the bar higher than it would by allowing disabling of refresh on individual snaps. A user would have to have a significant enough issue with a particular snap to take the step of disabling all together. By allowing this without the need to redeploy or in many cases build the app from source, longterm adoption of snap is not significantly impacted. If we end up having to move off of snap due to this latest issue, I can guarantee neither we nor our customers will support moving back to snap; but if we have a failsafe to protect the environment while issues of snap or the maintainer of a package are addressed, we wouldn’t have an issue re-enabling it.
LXD based validation testing- We are already working on defining a script and process for using the latest LXD ability to convert a host into a Container. Our initial approach is to create a tool to create a LXD container of the current host state, spin it up, snap refresh, and validate it does not fail. We manually did it today and at least in this case it would have shown that their was an issue prior to killing our production environment. Long term, something like this would be incredible as part of snap, whereby a container was created with all the same settings/configs as the machine being refreshed and tested before forcing the refresh on the actual system…This cool tool, would only be useful though, if snap provides the required mechanisms to hold off on the forced refresh if validation fails.
Rating system- Kind of a separate idea, just putting out there for discussion. A rating system based on bug reports or user feedback of failures from Snap refreshes would not only be informative to the users about the reliability of applications, but could also be used as a mechanism for enforcing rules requiring maintenance of prior release tracks as well as the timeline a developer is permitted to force auto refresh. For example if the maintainer has refresh failures indicated by ratings in the last 12 months, they must maintain the Prior Version Track for 90 days.

Perhaps some of this has been discussed or addressed before. As I mentioned we are recent snappers. I appreciate the discussion and any comments and hope this a useful contribution to the discourse. Ultimately we like many others will have to make a decision, sooner than we’d like, to mitigate the risks of auto refresh or be forced by the customers/users to abandon snap.

pedronis · November 16, 2018, 9:44am

yes, this is something we will be looking into

jzimm · November 23, 2018, 3:30am

I think there is another concern here too. Take software like Eclipse, Blender, Musescore, FreeCAD etc. It should be obvious that you would never want to update to another major version right in the middle of a project you are working on. But such a “project” can last many months or sometimes even years. So being able to merely postpone an upgrade doesn’t work. Saying that it wouldn’t be a problem because the snaps are presumed vetted etc. doesn’t work either. The user needs a switch to completely disable updates for a given packages until further notice.

Another approach that may work would be to have specific channels that offer say only security fixes or known bugfixes for a given release series, but no major upgrades, new features or incompatible changes. LXD for example does something along those lines, but it would need to be a systematic policy in the snap store.

Either way this is a problem that needs addressing IMHO.

Ads20000 · November 25, 2018, 12:48pm

I suppose at the moment you should report a bugs for major software that you think need tracks but don’t yet at the moment. It’s up to them to get those tracks registered and start using them. Using tracks you won’t get major version updates.

Epochs would automatically prevent automatic refreshes to application versions that change the file format incompatibly, so once the feature is implemented and snaps make sure they give a breaking snap a new epoch then your updates will indeed be held back until you manually refresh them, as I understand it?

jzimm · November 26, 2018, 1:58am

That sounds like the kind of solution I was thinking of. I thought however that Epochs were already implemented in snap (even if nobody is using them)?

PS: by the same token this also kind of vindicates the idea that there should be an override switch anyway. Suppose that you rely on an application whose packagers don’t use this, or don’t do it correctly, or where a potential breakage results from an upstream bug that the packagers didn’t know of, etc. The user should be able to make sure that the software won’t change unpredictably under his/her feet and there should be an option to install updates at the user’s discretion only.

After all being able to implicitly trust that your computer obeys you and only you is a huge part of what FOSS is about, or is it not?

Ads20000 · November 26, 2018, 3:52pm

According to both the topic and the roadmap it’s still ‘upcoming’.

Yes, the snappy devs seem to accept that there is a small probability that potential breakage occurs, they seem to be holding back implementing an off switch to provide motivation for developers to implement proper testing procedures so that there is as little breakage as possible and so that no-one ever needs to use the off switch (if/when it eventually is implemented).

So, they want to solve the problem where users don’t upgrade their software because it’s hassle and they end up getting hit by security vulnerabilities. I suppose the snappy developers are betting that users are going to underestimate the risk of security vulnerabilities damaging them and underestimate the benefits of updates and so they override the users’ desires on this.

You need more, preferably actual, use cases to prove to the snappy developers that this is necessary. At the moment they think that their mitigations cover most use cases and that they can grow features to accomodate others and that users should switch to alternative solutions (presumably Flatpak, AppImage, or traditional packaging) if snappy can’t grow the features to satisfy their use-case.

‘a huge part of what FOSS is about’ kind of, but for software to be free it merely needs to satisfy the four freedoms:

The freedom to run the program as you wish, for any purpose (freedom 0).
The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
The freedom to redistribute copies so you can help others (freedom 2).
The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

You could argue that not having an off switch for updates violates freedom 0, to run the program as you wish, for any purpose, but one can’t say that software not having all the options that you could possible wish for means that it violates freedom 0, so it’s a bit of a tenuous argument. Since the code is open, in any case, in principle you can go into the code and create the feature you want, which is not always possible with closed-source software.

I suspect snappy devs are more interested in the definition of open-source software than free software, however. The OSS definition does not include freedom 0, it seems.

Also, obviously, the store code is closed, which Niemeyer has addressed here.

ijohnson · November 26, 2018, 4:31pm

Another snap mechanism available immediately to address this would be tracks. With tracks, you install your software on say track 1 (which semantically only allow you to use version 1.X), and the developers of that software release version 2.X you don’t get automatically upgraded. Instead you will only get versions 1.1 and 1.2, etc. that the developers release on the 1 track. See a full explanation of how tracks work here.

The major and important difference between epochs and tracks is that epochs would allow automatic upgrades across epochs when they are compatible (i.e. when version 2.5 comes out and it can understand/use version 1.X data), whereas with appropriate usage of tracks this would never happen and you would be stuck on 1 track and using versions 1.X indefinitely until you as a user manually decide to switch from the 1.X track to the 2.X track, or to the stable track, etc.

See the PyCharm snap or the NextCloud snap as good examples of snaps using tracks. Using tracks is not required however, so if there is a particular application you care about that you want to track specific versions, then you should reach out to that developer and request they use tracks.

Epochs while not fully implemented are in progress on both the store and snapd side. For example see PR’s to snapd: 6142, 6172, 6179, and 6192.

Squall · January 7, 2019, 12:24pm

Being a relatively unexperienced user (I just learned about snaps) I just figured I would post my view after spending quite some time reading this.
I started of because I want to use an app and be sure that it does not break. Hence I wanted to deactivate automatic updates for it.
Finding out that I can control the frequency down to once a month changed my mind in my non-critical use-case. I will spend the time to support the app development by updating once a month and reporting bugs if any occur - and get the latest features. For my scenario it is only important that I can go back/ignore a specific update in case it breaks. As far as I understand this is possible by using revert and I will test it as soon as the app breaks for the first time.

I just wanted to thank you for integrating the option to update once a month and the metered connection deferral. We have a critical day each monday so i set the update to be every last tuesday of the month.

peterpan · January 11, 2019, 10:02pm

There still is no way to disable automatic snap updates? I’m running Debian and have disabled /lib/systemd/system/snapd.refresh.service and /lib/systemd/system/snapd.refresh.timer, yet snap updates are still installed automatically

river · January 12, 2019, 2:44am

I consider auto-updating direct from the developer to be a security hole. Opening a direct, and fast, channel straight to my machine can be used for both good and bad.

In sensitive situations, or on sensitive machines, I consider it too powerful to be trusted. For example, a developers keys can be stolen. Or a developer could realize the power they wield and introduce furtive back doors.

I understand if this was once the realm of privacy-enthusiasts, and not an issue seen in the realm of real-world implications. But consider a machine that processes payments. We are in the era of downloadable dollars. The “testing” period introduced by most distros can prevent more problems than packaging screw-ups.

For these reasons I would not trust SNAP is a high security context. I don’t know how much power the ubuntu package maintainers wield in this sense, but I’d guess it’s similar to the power the snap maintainers wield over the snap world. A necessary evil. The difference is that with snap, I also give absolute trust to any developer who once wrote something I felt like installing. I will install whatever he feels like pushing to me in 6 hours flat.

river · January 12, 2019, 3:39am

To further weigh-in with my two cents, I think the problem becomes when is it more secure to push updates out FAST and when is it more secure to WAIT. I think this is a hard problem.

From a security standpoint, I am concerned with all the access channels that allow code to flow to my system. I want to know that some community verification of changes has occurred before they are automatically applied. If such a verification process is not feasible, then in very many use cases, the correct response (from a security standpoint) is to not apply the update.

Ads20000 · January 12, 2019, 4:14am

Snap developers admit that you need to trust the developers of the snaps you’re using. If you don’t trust a developer then don’t use their snap! To get up-to-date packages on any distribution based on time-based releases and freezes (for stability, updating every part of the OS as it becomes ready as rolling-release distros do is not as stable as people think it is because all packages are interconnected, it’s hard to test each individual update one-by-one on the stable base (not in testing, in the case of Debian, for example) on enough hardware and software configurations and from enough previous package versions to ensure as good stability as freeze-based OS’s do), people often use external repositories (PPAs in the case of Ubuntu and Mint, COPRs in the case of Fedora, though their stable releases are much more flexible in terms of application updates it seems, which maybe Ubuntu should move to (but maybe there isn’t the manpower, the Backports repository exists but it doesn’t seem that many applications that could be updated via Backports actually are)). These are even more dangerous than snaps (though it’s more clearly written on Launchpad that they are). Snaps are confined, they only have access to certain parts of your system, so what you’re talking about shouldn’t actually be a problem unless a user is tricked by an application into giving it access to something it shouldn’t have access to. If you can think of ways that snaps could break that confinement (or ways in which they’re not currently confined and they should be) then please file bugs as these are serious security issues that could be fixed. You can do that here or if you’re talking about how confinement could be expanded you could try creating a new topic here.

This sounds like something a company would be doing, generally speaking, and they should talk to Canonical and set up the snap enterprise proxy so they can control updates on that machine. If you can think of machines that would really benefit from the proxy but that lie outside of the corporate sphere then please give use cases (preferably real ones)!

I think that having an option to disable automatic refreshes would be good and @niemeyer (the snapd lead developer) has expressed that although they are reluctant to introduce a global off switch (because that would prevent them fixing the problem of people sticking to old versions of software to avoid change (in their view, to increase stability, even though changes in the world mean that old versions can become unstable and insecure without users really doing much about it)) but they are very happy to introduce changes to accommodate users’ needs, this is part of the reason for keeping the off switch, because it forces them to make changes to snapd so that people do feel able to update their snaps, rather than turning their updates off. If you can think of ways in which snappy can do this (e.g. extending the refresh timer period (currently you must update at least once per month, if I recall correctly setting your Internet connection as metered ensures that you don’t have to update until two or three months past your initial update)) then suggest them.

By the way, you might just want to IP block the server snapd uses if you want a hacky off switch if you’re desperate for one. I don’t know what the IP is but it’s a known workaround

If you don’t trust an upstream to test new versions properly, security-wise, then don’t use their snap (unless you think they will have it sorted within one month of the update, in which case set your refresh timer to update once per month and you’re sorted, though if the update came 31+ days after your previous refresh then it might update anyway? I don’t know how the timer works really, @zyga-snapd? I think you may have told me before but it’s still unclear to me… the system options doc should probably be updated to make more clear how the hold process works (e.g. maximum hold time, when the time is counted from) so people know how to fit it to their use case? I can update the doc if someone can remind me how the timer works) If they had an LTS branch that you would consider secure, then use that: see MicroK8s for an example, if you click latest/stable in the top-right, you’ll see that there are different tracks listed, so running snap install microk8s --classic --version=1.12/stable will keep you on the 1.12 track (until it’s closed, I’m not sure what happens when it’s closed?!) rather than on latest, so you’ll only get updates for 1.12, which I presume are just security updates (though I don’t know if MicroK8s sticks to semantic versioning or not), note also that MicroK8s is a classic snap so it has much less confinement than a normal snap and so is much more insecure. Classic as a confinement setting is considered transitional, @niemeyer plans to get it removed when strict (the normal confinement, but remember, a huge improvement on Debs/RPMs since Debs/RPMs don’t have any ‘sandboxing’ really) confinement is usable for more apps.

dabeegmon · January 12, 2019, 2:09pm

By the way, you might just want to IP block the server snapd uses if you want a hacky off switch if you’re desperate for one. I don’t know what the IP is but it’s a known workaround

Would most definitely recommend you do NOT do this. Doing this on my server, previously set to a once a month update, produced a setup that resulted in a machine that shut down once a month (and this happened 4 months in a row). Exactly why this is happening I haven’t been able to pinpoint but then I’m not a sys admin nor any kind of programmer and after wading through a hundred pages or so of log files all I can say with surety is that the system is being shut down. As this behavior did not exist before I installed snapd so that I could use lxd it stands to reason that this, using a firewall rule to block upgrades, is likely the primary cause of my issue. There have been NO reports of anyone doing this ‘hacky off switch’ successfully! YMMV - - - – good luck.

river · January 12, 2019, 6:10pm

How about a joe schmo user who handles bitcoins on his/her desktop? Or any user that only logs in to their email with a password? gnome-calculator from snap is installed on ubuntu desktop by default (IIRC). Now, someone who hacks the dev(s) with the gnome-calculator key has just obtained the ability to push back doors onto presumably millions of people’s desktops in a matter of six hours.

I’m sorry, but this is just an idealistic philosophy that ignores the subtle reality of the situation. Snap is well designed to buffer against the classic model of security vulnerabilities – I might call this the “sshd zero day” class of vulnerabilities. BUT, most software, including gnome-calculator, do not even have this class of vulnerabilities (they do not run external code (like a browser does) nor do they expose ports to the public internet).

I’ve never had to explicitly allow a snap to access the files on my machine. That right there is enough for a snap to edit files containing code (like “.bashrc”) to achieve privilege escalation to non-sandboxed level. If it’s possible for a snap to install a global hotkey or similar, then it may be possible for a snap to implement a key-logger. Hence, I must presume that it is possible for an update to gnome-calculator to scan the filesystem for *.kdbx files, and send those as well as the data from a key-logger to some remote server.

Turning updates down to monthly is not a robust solution to this class of problems. Suggestions to hack the system to block updates (block ip’s for example) should just be ignored as non-constructive.

The point remains that allowing devs to push code to my machine should be a careful process with checks and balances and many points at which said code may be prevented from reaching my machine. This must be done for security’s sake, lest we classify the auto-update mechanism as a back door itself. Imagine if every piece of software on my machine was a snap. I’d now be vulnerable to the security practices that dozens (or hundreds) of devs use to protect their own private keys, as each key is effectively a back door key to near-immediate remote execution on thousands or millions of systems. (A dev could also exploit this and then claim they were not responsible, that they were hacked, afterwards.)

At a minimum, snap should allow users to receive updates on a delayed schedule, and only install automatically after X days delay. The setting should be per-app with configurable global defaults, and may also include notifications or emails to the user with notes that the devs would include with each update. There should also be a way for devs to mark updates “critical”, as that is the only situation where fast updating is a security boon. The user may decide to auto-update “critical” marked updates with less delay, or only review and apply “critical” marked updates. Furthermore, if I set my system to have a 90-day delay on non-critical updates, then when I install a new snap, I should get the latest 90-day old version immediately after installing.

Ads20000 · January 12, 2019, 11:57pm

Could you please enable permanent systemd logs and then, next time your computer shuts down, could you please check your log (it’ll be the entries immediately before the time you start up the system again)? You can do this with journalctl -r (or preferably check the log in /var/log/journal so we can see the full lines). Then, could you please post the logs for the shutdown to paste.ubuntu.com? Then the devs here can have a look! Otherwise, for all I know, the shut down could be due to something else that happened when you implemented the IP block, or it could be an unintended consequence of the IP block not due to snappy, or it could be your outdated software (which is outdated as a result of implementing the IP block).

dabeegmon · January 13, 2019, 12:37am

Interesting idea - - - thanks.
Except - - - - I haven’t been running said server because I couldn’t count on it. (Tools that other people control don’t last very long for me - - - - I do remember the slogan from early microcomputing - - - - computing your way - - - implying that the IT center did NOT have control.)
I haven’t had the time required to go through all the steps that are required to purge that system from all aspects of the snapd/lxd combination that I had used - - - - and yes it wasn’t even possible to remove files/directories using rm -r.
So my efforts after I choose to restart the system will likely have a different focus.
When I did mention the issue after the second system take down the comment was interestingly denigrative and quite convinced me that expecting any positive change was likely futile.
I continue to follow this particular thread because I thought that lxd had the possibility to be an excellent product but as said product was permanently tied to snapd - - – well one can only hope.

tony · January 13, 2019, 1:41am

@Ads20000 Before you respond to @dabeegmon and @river, let’s step back and reassess what they and others (including me) have been saying (via my paraphrasing)…

Frequent updates are the teeniest, tiniest, tip of computer security.
The difference between automatic updates and frequent manual updates is indistinguishable.
Any form of updates should NOT reduce security (I provided such a use case at-least twice).
The Linux/FOSS way is to put users in control.
Rather than use cron (or friends), the devs reinvented the wheel. These last two points are the sole reason this thread exists.

If security is really the issue, then I would have expected Canonical to:

Put IDS front and center by building in any of OSSEC, AIDE, etc.
Have an installer option for a bastion server (e.g., central log server, OSSEC server, etc.).
Integrate IDS and automatic updates. (Again, that was my suggestion.)
Put substantial effort toward AppArmor profiles.
On and on…

Now, don’t get me wrong, read-only filesystems and atomic updates are nice. However, there are a number of existing projects to which Canonical could contribute. Frankly, just containerizing web browsers, would have a larger security impact.

Also, I want to second @dabeegmon on “the comment was interestingly denigrative and quite convinced me…” I made a simple and constructive suggestion that was twice blown-off; otherwise, from my perspective, all I see is push-back. I came to this thread because I wanted to try Ubuntu for some IoT work, which snaps are a critical part of the Ubuntu solution. Instead, I have been “quite convinced” to stick with Debian and not use snaps. I hate say it, but this thread “feels” like snaps are headed in the same direction as mir, etc.

Ads20000 · January 13, 2019, 1:50am

This is because the devs have a vision and they actually have no obligation to fold to users on this issue, it’s their software, you don’t have to use it. I’m pushing back because I’m trying to make progress by giving responses that I think knock down suggestions that the devs have already signaled they’re aware of and have rejected, I want to try and encourage both users and the devs to keep this thread open and keep talking in the hope that more refinement of the status quo can be found (or even overturning the status quo if people can give enough hard use cases that are not covered by existing solutions devised by the devs and for which the devs can’t find adequate solutions in the future). I’m sorry if people have found my comments unhelpful but I’m trying to help you write responses which the devs will find useful. In @dabeegmon’s case I really, really think that the devs will need logs (because dabeegmon’s explanation of their situation could be wrong and I don’t see why the devs should fold their position based on an issue reported by just one person and without adequate evidence to back it up) and I’m more than happy to help dabeegmon to find them.

If you’d rather I just shut up and let the devs get annoyed with the responses (if they read them at all), answer angrily (if at all), and close the thread, thus shutting down all criticism and hope of change in the future, then I can do so, if you think that would be more helpful.

dabeegmon · January 13, 2019, 12:22pm

What I’m finding really fascinating in the very long chain is the polarization.
@Ads20000 This is because the devs have a vision and they actually have no obligation to fold to users on this issue, it’s their software, you don’t have to use it.
@tony The Linux/FOSS way is to put users in control.
>Rather than use cron (or friends), the devs reinvented the wheel.

When I read this - - - - I understand one thing - - - - - an incredible polarization.
Now there has been talk of accommodation but if anything even that has largely diminished.

Another interesting point (scrolling back about 10 months in total) is that it would appear that the dev community might read any posting here but that they are no longer responding. This lack of responses further buttresses the understanding that there are some strict orders NOT to engage further. (Interesting that!)

Philosophically there have been quite a number of arguments presented here as to why the ‘vision’ as you have termed it really isn’t getting the traction expected from the dev team.
My thinking is that there is presently a market for this from those that aren’t worried about the issues that have been raised (over and over again) and from corporate embedded system development (a part of IoT) where it will be locked into a ‘do not upgrade’ setup because corporate doesn’t want the headache of dealing with upgrade issues from users.
The likelihood of me actually uploading the many pages of logs needed for anyone to exactly ascertain the cause of my system shutting itself down 4 consecutive months after I added a firewall rule disallowing access to revisions - - - - well its quite low. Why you ask - - - - my ability to trust this process has been very badly bruised - - - - I followed directions and dev team suggestions - - - - - and I got something that I didn’t want - - - - something I can’t even remove (except with a fairly complicated multi-step process) that I haven’t taken the time to work on because I use computers as tools to do other things and I just haven’t had time to battle with something that is going to be a time sink hole.

Ads20000 · January 13, 2019, 11:49pm

You do not need to ‘upload … many pages of logs’, just journalctl for the time that your computer shut down. If you are unable to provide this then the developers cannot have any way of working out what your issue is for themselves.