[WIP] Refresh App Awareness

This thread summaries an upcoming feature of snapd that is something many application developers were asking for a long while now. Internally we call it refresh-app-awareness because it means snapd is aware of running applications and will inhibit refreshes of snaps with apps running. The feature is smart enough to filter out services that are restarted during refresh.

Current Status

The feature is currently under development, coming to snapd 2.39 behind a feature flag, that is, it will require opt-in from the user, like some of the other features being introduced did, until it is deemed stable for general use.

The feature can be tracked in GitHub using https://github.com/snapcore/snapd/projects/3

There’s an older thread that has a lot of the back story here Bug? Saves are blocked to $SNAP_USER_DATA if snap updates when it is already running

Preconditions

To enable the feature you must be on snapd 2.39 or newer. On Ubuntu and Debian you can simply refresh the core snap to the edge channel to do so. You can always refresh back to stable if you don’t want to test unstable software that might interfere with your system.

To use the feature now run:

snap set core experimental.refresh-app-awareness=true

NOTE: If this doesn’t work the version of snapd you are using is not recent enough. Just try again on the next refresh from edge. It should be available soon.

Ideally you should restart all the running snap applications (e.g. reboot if you want to) as process accounting that is used by the feature internally will only start from this point onward. If you, for whatever reason, want to disable the feature simply run:

snap set core experimental.refresh-app-awareness=false

Usage

Having this option set you can try to manually refresh a snap to another channel or to simply refresh to a new revision arriving on edge. For as long as the snap is busy it the refresh will be inhibited. A snap is busy if it has running non-service applications or hooks.

Future work

The feature will eventually support termination notification and will be able to refresh a snap the moment the last app in use is closed. There are some early plans for allowing applications to be notified of a pending refresh but that is further down the line as it has some complex dependencies that are not ready yet.

8 Likes

It would be really useful if while you’re designing this aspect you could take into account snaps on Ubuntu Core being notified of a kernel or core(18) snap refresh (which will cause an impending reboot). We have had some cases where an app is running in a snap (not a service/daemon but an actual app) and a core/kernel refresh happens and upon reboot the app was unexpectedly killed, leaving external hardware in a bad state until the device rebooted and the operator was able to repair it. It would be enough to support some hook mechanism like you’re describing to simply tell the snap that the system is about to shut down so that it could cancel/kill any running jobs on the hardware and put the hardware in a safe state for rebooting.

2 Likes

That’s interesting feedback, thank you!

We don’t have the hook designed to that detail but this is certainly a useful way to shape it as we eventually get to that spot. For now the problem here is that we don’t have hooks that would be able to interact with the snap in the user session.

2 Likes

The soft check has landed in edge. The hard check is proposed in https://github.com/snapcore/snapd/pull/6751 and will likely land next week after Easter break.

As mentioned elsewhere, it would be nice if once the machinery is available to kill all processes from a snap before doing a refresh, we also employ that on snap removal, so that after removing the snap, all processes are killed and then we unload the apparmor profiles that were loaded into the kernel (and also perhaps any seccomp bpf programs, udev rules, any device cgroups setup for snaps, etc.).

I have been told by some of the Android Studio snap users that their IDE gets restarted automatically while they are doing real work (Since updates get installed automatically). Is this change supposed to “fix” that scenario as well ?

1 Like

Yes that’s exactly the use case for this feature

1 Like

We have that machinery but killing applications would be excessive. I don’t know what conditions would be acceptable for that operation. Do you have any recommendations?

It seems intuitive to me that removing a snap would kill the applications associated with it for a few reasons.

  1. I think there’s a general user expectation that removing an application whether via gnome software or snap remove will also remove most of the state for that application, i.e. both files (in the form of $SNAP and maybe $SNAP_DATA, but perhaps not $SNAP_USER_DATA since that is in the user’s $HOME directory) and running processes.

  2. Additionally, a common first try at debugging why something isn’t working (especially for non-technical people) is to remove the application and re-install it. If we don’t kill the processes associated with a program then this won’t necessarily work as well as it could because there still might be previous instances of the snap applications running.

  3. If we are removing some files associated with the snap, i.e. $SNAP, $SNAP_DATA, etc. then the processes that are continuing to run may operate in odd ways since when the processes started they had access to those files but now (some) files have been removed. This is distinct from a refresh where those files still exist but underneath a revision path which the running application should know about. I suppose you could say that $SNAP_DATA since it’s for root should only be accessed by daemons which will be killed properly by systemd, but still if the files in $SNAP are removed then the application could still operate very oddly and I don’t think it’s a reasonable expectation for the application to gracefully handle the case where all it’s installation files are removed while it’s running.

As an aside, I think it’s safe to do only on removing since removing a snap isn’t (shouldn’t?) ever be an automatic operation.

I would put a different spin on this: removing a snap should fail if there are apps running. We may offer a --kill-running-apps option, or something of similar spirit, to enforce that. This is better in my eyes because it is less surprising than realising you were in fact running a snap application without knowing it.

I think I agree with you about failing the removal if there are apps running, but to be clear, my order of preference would be:

  1. Fail snap removal if there are running apps
  2. Kill running apps during snap removal
  3. Don’t kill running apps during snap removal

I hadn’t realized that 1 was an option, hence I was arguing for 2 instead of 3 which is the worst option to me (and also the current situation).

I had a look at visual studio code, hoping to see what is the method of detection of background snap refreshes. This is what I found https://github.com/microsoft/vscode/blob/597d8da84a8f5c7263aa9fbe90984b35807a1b27/src/vs/platform/update/electron-main/updateService.snap.ts#L201

In short, every now and then they look at the target of the current symlink. As such, with the current mechanism Visual Studio Code won’t be able to detect updates because there are none that would happen. I’m thinking if this warrants a discussion about the grander role of the current symlink and reliability. We might be able to actually not use the current symlink at all for anything and still change it so that applications that choose to. look at it will see the “change”.

At the same time I’d much rather introduce a snapctl API call to allow apps to ask as well as a notification mechanism (either new hook or a simpler method that apps can easily integrate with filesystem notification services).

We discussed where to take this feature from here and here are some quick notes:

New changes for v2 (from top to bottom as they arrive)

  • add snapctl refresh-available that instantly (without talking to the store) tells snaps that a refresh is pending [1w]
  • add new lock that inhibits application startup during refresh process [2w]
    • the lock needs to be safe from unrelated errors - bound to process, bound to ephemeral file system object
  • add cgroup-based app termination mechanics to snapd (20/80 approach, simple polling until cgroups v2 make it easy) [2w]
  • add new UX for command line and GUI apps that displays refresh progress while that lock is
    held or while current is gone, and we are attempting to start the app
    • cli just shows the changes via snap socket [2-3w]
    • snap run sends signal to the session agent to display the UI and waits in the back for the refresh of the app to change [2+w]
  • session agent UI response for the signal [2-3w] (1w with zenity)

Interesting things but not for v2

  • add snapctl refresh-and-rerun that apps can use to tell snapd to actively refresh them after the app terminates
1 Like

We need to think a bit more about this, we do want a variant at some point that also talks to the store. That means two commands or a careful use of options.

We discussed that we do need a way for the app to ask snapd to take the lock before the app terminates otherwise it might be restarted before we get a chance to take the lock/start the refresh.

Current thinking is to have snapctl refresh-available --offline, to be true to that when we have a pending update because of running apps we should download enough to be able to proceed indeed even if “offline”, that means logic similar to what we have done now for remodeling.

I have had experimental.refresh-app-awareness set to true for a few weeks now, and it works as intended, preventing snaps from updating while the applications are running.

However I have more than once observed an issue which I believe is related: since chromium is running pretty much all the time, it never gets a chance to refresh itself, except when I reboot my machine. When I do and there’s a new revision, it gets refreshed, but if after rebooting I run chromium too soon, I find that the profile directory that’s stored under $SNAP_USER_DATA is incomplete, as if launching the app had interrupted an ongoing copy operation. I can tell that because my profile directory weighs more than 1GB, and when that happens the current profile is much smaller (usually a few hundred MB).

The workaround is to close the app, delete the new profile directory, copy the old one with the new revision number, and launch the application again. This is not user-friendly though.

Is this a known issue?

This is a known issue. As a part of the new design, though this is not implemented yet, the application will not be allowed to start during the refresh operation.

It looks like experimental.refresh-app-awareness isn’t working for me. I have set it some time ago (have rebooted several times after setting option). But chromium snap updates while it running causing loss of changes made until I restart snap.

Option is set:
% sudo snap get core experimental.refresh-app-awareness
true

% ls -la ~/snap/chromium/current
lrwxrwxrwx 1 baz baz 3 Nov 22 13:11 /home/baz/snap/chromium/current -> 949

“current” link was relinked at 13:11, while I was using old version of snap until 15:32

% ls -la ~/snap/chromium/937/.config/chromium/Default/Login\ Data
-rw------- 1 baz baz 589824 Nov 22 15:32 '/home/baz/snap/chromium/937/.config/chromium/Default/Login Data'

What version of snapd are you on? Can you please paste the output of snap version.

% snap version
snap 2.42.1
snapd 2.42.1
series 16
ubuntu 19.10
kernel 5.3.0-19-generic