Disabling automatic refresh for snap from store

mwinter · May 20, 2017, 1:43am

I really need some way to disable the automatic updates of Snap’s from the store. As far as I know this isn’t possible right now.

Background: I’m the maintainer of the FRR Snap (See FRRouting.org), a routing daemon implementing all the different routing protocols (RIP, OSPF, ISIS, BGP, PIM, etc). Snap’s are great, specially for installation on Whitebox switches, but the automatic update is killing it.

Whenever I push a new snap to the store, all the users have a unexpected outage for a few minutes (ie BGP sessions lost, OSPF neighbors lost etc). For this application, I need a way to avoid automatic upgrades and let the user decide when and how to upgrade.

At the current stage I had to stop pushing updates to the store because of this.

Suggestions, Ideas etc are welcome…

Regards,
Martin

chipaca · May 22, 2017, 1:09am

I think you (or, rather, your users) should be able to run

snap set core refresh.schedule=23-5

to ask that the refreshes only happen once a day, in a window between 11pm and 5am (as an example).

Give it a try?

mwinter · May 22, 2017, 2:30am

Even if this works, this isn’t a solution. I can’t think a single network who is willing to have routers rebooting outside a previously announced maintenance window.

Basically, what the users of my snap need is a way to just get notification of update and then make their own decision when to update (ie a few days…weeks later - or in case of minor update then skip it).

Think it this way: Would you want your Linux server to reboot anytime a new kernel is released on it’s own?
My snap is used on core routing and any refresh will bring down the network - even if it’s only for a few minutes.

Martin

niemeyer · May 22, 2017, 12:35pm

The issue that makes us resist the idea of simply disabling updates altogether is that very often that will mean never update rather than update at someone’s discretion, and then we’re getting back to some of the problems that got us here in the first place. That’s why we’ve been resisting introducing that global switch, at least for the time being, and instead working with people to mitigate the bad side effects of having automatic updates enabled.

Today we’re already able to schedule the precise window in which the system should update within the day (potentially multiple windows, all with start/end times), and we’re also able to hold minor versions into independent tracks so that major bumps are not delivered automatically. Soon we’ll also start working on health checks, which will allow systems updated to automatically rollback if the system finds itself in a bad place after the update.

So, we understand this is a bit of a departure, but we’re honestly interested in trying to take this pretty unique opportunity to try to mitigate some of the recurring issues we’ve observed in the last so many years.

Given the understanding above, is there anything else we can do to help mitigate the real issues you’ve observed without simply turning updates off entirely?

lool · May 24, 2017, 8:53am

On top of what was said earlier in the thread, there are also plans to let some daemons survive across snap refreshes. This wouldn’t mitigate the reboots after OS/kernel updates, but could avoid disrupting your daemon if it’s designed to keep working if the on disk bits are updated.

cratliff · May 24, 2017, 1:52pm

If there isn’t a way to disable automatic updates, how would staged roll-outs work?

I know there are different tracks and channels in a store, but if I only want to update 10% of devices at a time to ensure stability or do some intensive A/B testing is there a way to do this other than to move the customer from one track/channel to another? It also seems like the publisher isn’t the one who chooses which track/channel the user gets placed in, is there a way of doing this other than asking the customer to be on a different track/channel/branch?

mwinter · May 24, 2017, 11:01pm

This is a start into the right direction, but I believe it doesn’t go far enough. I assume most users want better control to decide when to upgrade (ie test it on some limited deployment first etc).

If I understand this change enough, then a reboot / power failure etc would “finish” the upgrade (loading the new version) - or the same with a restart of a daemon.
The 2nd one might cause more issues on packages which consists of mutiple daemons (like my frr package) and may potentially cause a version mix (if one daemon is (re)started at a later time)

Anyway, better than now, but I would prefer a simple command to disable automatic refresh for a specific snap.

Martin

geekgonecrazy · May 25, 2017, 3:21am

You definitely aren’t alone in wanting this. We’ve had several users of the Rocket.Chat snap ask for this as well. They are just simply used to being able to say Tuesday is update day. Or be able to schedule a down time to alert their users.

I definitely understand where the snap team is coming from though. This is one of major benefits of snaps. The fact that they will be updated. That we as snap publishers can push out a critical vulnerability patch to our users and not have them ignore it. Of course the users get that same peace of mind.

I think if we had mechanisms in place to bring up the new version of the snap and do a more seamless hand off this worry of interruption / downtime would lesson.

I think there are probably several issues here. First by default snaps kill and unmount the previous snap version before mounting and then starting the new version. Secondly… if a network service. You couldn’t even start the second service because you would have a bind conflict.

Anyways… so far i’ve been just telling users that want this level of control to go the manual route because right now snaps aren’t designed for that scenario.

denis · June 23, 2017, 9:44am

Didn’t you try to block Canonical and Ubuntu domains inside your router with iptables? You are a network guy, you should try. Write a script on the desktop which connects with ssh to your box and executes smth like:

sudo iptables -A OUTPUT -d search.apps.ubuntu.com -j DROP

And you’re free till the next reboot.

mwinter · June 23, 2017, 8:57pm

Denis,

Me (personally) could do this. But then I rather dislike doing any [additional] filtering on a router as this could cause issues in the future… It also blocks other snap’s and potential more (not sure what all would be at the same location)
Another (in my opinion simpler) solution is to download the snap and install it locally - not from store.

But what I’m looking for is a solution for the USER of my snap. Something which is kind of automatic or can be set by the snap maintainer to accommodate a different update policy. Something which would probably download a new snap and give a warning/notice of a new snap available, but does NOT install it until a “ok” is given by the user.

We are currently getting close to release a new major version, but I haven’t updated the snap in weeks as I know I’ll cause network outages at various places just by pushing a new snap into the store.
My current (unfortunate) way is to just NOT push any new versions to the snap store until this is resolved or there
is a major security issue I need to address (but not for new major versions)

niemeyer · July 6, 2017, 12:54pm

Yesterday we discussed once more this topic, and after exchanges with several stakeholders there was agreement to increase the allowed window in which refreshes may be scheduled.

The agreed semantics to be implemented are the following:

Refreshes may be scheduled at an arbitrary weekday and time within the month (e.g. second Tuesday between 1pm and 2pm).
Refreshes may be deferred for up to another month so that missed windows and re-scheduling may happen without strange side effects. For example, if it was scheduled for the first day, and then gets scheduled for the end of the month just before it happens, there may effectively be a two months window without refreshes.
If the system remains out-of-date after the two months window, the system will start attempting to refresh out of the window.
That maximum window is reset every time the system is refreshed, so out-of-band updates may performed at a convenient maintenance window.

These changes should greatly improve the behavior when using third-party snaps in servers, while not giving up on the goal of encouraging systems to remain up-to-date and secured.

Please let us know if you have any comments on the topic or the proposed changes.

mwinter · July 7, 2017, 1:30am

Thanks, this is VERY WELCOME news.

So for my case (FRR snap, which is a routing daemon):
I think the timeframe is ok, still not sure how network admins react to automatic refreshes if they miss the window. I understand the desire to get people to update more frequent, but the forced push is painful. Definitely much better, but it may take some time to push network admins to a more constant upgrade cycle.

Related question:

What happens if I miss a bug? Ie, I push a new snap out, someone tries to upgrade a system and it fails for their setup. How would this be solved?

Can they block automatic upgrades to the version which they know has a bug affecting them?
If they report it to me, can I retract the bad version from the store (I may not be able to update it fast enough, so I would prefer to delete it and stop anyone else going to new version until this is fixed)?
If I can do this (I assume I can), what happens to other users which are already upgraded to new versions? Do they get forced to downgrade again? Or would they be able to stay on the removed version? (I assume bug might not
affect all users, so most of them might want to stay)

mwinter · July 7, 2017, 1:35am

One more comment:

I would prefer to just ignore the DAY part, but keep the time of day. So if someone specified second Tuesday between 1pm and 2pm, and remains out-of-date, then drop the Day, but still try to stick to the hour part.

Martin

Ads20000 · July 9, 2017, 10:17am

I’d still quite like a global switch on the user side, I’m on less than 1mbps Internet (yes - megabits, not megabytes; UK Government failed to fulfil their pledge to have minimum 2mbps everywhere by Dec 2016) and automatic updates take up bandwidth and make the Internet unusable for my house, so I have to kill the snapd process… I can disable Apt updates from Software & Updates, graphically, but I can’t disable snap updates at all, graphically or via the command-line…

This is probably a separate issue sorry, has got a thread here…

cratliff · July 10, 2017, 8:56pm

Would allowing the publisher to define an update schedule work? The publisher could specify when creating the app that it is to stay updated automatically, the user can specify within certain windows, or the publisher has direct control over each snaps update.

To give more background. I’m working on a hardware device that will operate in a customer’s environment. My end user doesn’t have a terminal interface to change their update window, I won’t have physical access to a device in the field, I need to ensure a number of things, life the device isn’t moving or in other forms of operation when the device is updated.

I think in the IoT environment something more granular is needed, Ideally my snap needs to be able to signal when it is in a state it can be updated. Being able to access the update window or command for my snap from within the snap would be a good start. Having he ability to download the snap and then trigger when the updates occurs would be great.

denis · July 11, 2017, 3:38pm

Great news!

I propose here one more additional option:

Devices download all updates for snaps:
a) every time they become available, or
b) following you new logic described in your post

But the real update gets deferred till the next reboot. It mean that upon next power cycle when Ubuntu Core boots it can switch snaps to newer ones. And certainly there shouldn’t be a forced reboot like when the core snap updates nowadays.
Snapd can throw a warning upon login to ssh that updates can be finalized. You type smth like “snap update” and there you go.

This gives a possibility to have devices with potentially infinite uptime. But still a user can update device unplugging it and then plugging it back when he wants. This solves issues with headless devices, devices with no user access to ssh, or just when a user doesn’t even know that there is Linux inside.

Generally speaking Canonical should decide what it whats Ubuntu Core to be: a platform to create toys for children or a platform for real stuff (industrial, medical, automotive). I can’t imagine an autopilot saying on a highway at 200km/h: “Hey there! We haven’t updated the system for two months, you deferred it 10 times and I give no more chance! I’m gonna download the core snap now over 4G, update it and reboot your vehicle. You’d better pray”.

ogra · July 11, 2017, 3:54pm

We do have customers using Ubuntu Core with digital signage … that means that a supermarket with 20000 products will have 20000 display devices running on Ubuntu Core showing the price tags on the shelves… would you really want that supermarket to hire someone who logs in once a month to check if there are updates ?

Same goes for whatever sensor and monitoring devices you have running in the woods, miles away from any admin…

Ubuntu Core is designed for completely unattended operation, that means that updates and reboots need to happen fully automatic …

That said, there undoubtly need to be ways to allow apps to delay an update, as well as there need to be ways to declare to never update on 4G if there is a possibility to reach WLAN when the autopilot sits in the garage, but these are special cases that should be handled via additional options …

What @niemeyer described above should be the out-of-the-box behaviour for bare images to keep them secure, if you build something specific on top that has additional needs (like an app delaying updates even more or some such), extra features can definitely be added.

denis · July 11, 2017, 4:45pm

No, I don’t say so. That’s why I wrote about a user doesn’t even know that there is Linux inside.

Actually you start by defining a use case for a device. And based on that you configure an image to follow logic 1 or 2 or 3. For a supermarket it’s obvious that option with forced updates should be chosen.

I described a device that requires infinite uptime. It’s just another class of things in the world. Do you want to cover them adding option 3? Or ignore them? That’s the question. If it consists of creating a dummy snap that defers updates in an infinite loop forever, well it seems ugly. There should be a better way.

As an example you can take any network device, a router or a switch. It has no rights to reboot when it wants. Sometimes it’s required to have 99.999% availability that is 300 seconds of down time a year. This may only be sufficient to swap devices when hardware fails.

So for those cases I suppose a simple power cycle is a solution. Updates are downloaded every time a new version comes out. And a much newer version overwrites previous one if it wasn’t updated. And then these updates are applied on the next boot. That’s all. Completely unattended as you said. There is absolutely no need to maintain those devices.

niemeyer · July 11, 2017, 4:54pm

@mwinter Thanks for the feedback. It’s good to hear these ideas make some sense out of our echo chamber as well.

Responding to the questions:

That happens transparently when one does snap revert. The revision reverted out of is blacklisted and will not be reinstalled. When a new revision shows up inside the scheduled window, then that is installed and the problematic revision is "jumped over’.

Yes, you don’t even need to take it out: just publish another revision (new or old) and that will be the new tip. Soon we’ll also introduce live feedback over how well the roll out is taking place, and even block it automatically if there are strong hints that the new revision is disrupting clients.

They’ll refresh into the new tip. Whatever you release into a channel becomes the current target for every client.

The scheduling logic allows selecting multiple slots, so you can put in every slot that makes sense and the system will strictly respect those over the two months time frame. If after that none of the slots managed to be respected and there are updates pending, the system will disregard the slots as they’re not being exercised for some reason. There might be clock issues, for example, or perhaps the scheduling window falls overnight on a machine that is never turned on during that time.

That said, phasing out the requested window with less strict windows before going totally open is an interesting idea.

niemeyer · July 11, 2017, 5:04pm

@denis There’s a rich spectrum of real applications and each of them with different requirements. Many industrial devices can easily be updated and rebooted at predictable times, and some toys will in fact require strictly predictable refresh routines. In some appliances updating at reboot can make sense, in others it’s better to force a reboot and ensure safety sooner.

We’re gradually taking steps towards supporting every one of those cases.