[WIP] Refresh App Awareness

We have that machinery but killing applications would be excessive. I don’t know what conditions would be acceptable for that operation. Do you have any recommendations?

It seems intuitive to me that removing a snap would kill the applications associated with it for a few reasons.

  1. I think there’s a general user expectation that removing an application whether via gnome software or snap remove will also remove most of the state for that application, i.e. both files (in the form of $SNAP and maybe $SNAP_DATA, but perhaps not $SNAP_USER_DATA since that is in the user’s $HOME directory) and running processes.

  2. Additionally, a common first try at debugging why something isn’t working (especially for non-technical people) is to remove the application and re-install it. If we don’t kill the processes associated with a program then this won’t necessarily work as well as it could because there still might be previous instances of the snap applications running.

  3. If we are removing some files associated with the snap, i.e. $SNAP, $SNAP_DATA, etc. then the processes that are continuing to run may operate in odd ways since when the processes started they had access to those files but now (some) files have been removed. This is distinct from a refresh where those files still exist but underneath a revision path which the running application should know about. I suppose you could say that $SNAP_DATA since it’s for root should only be accessed by daemons which will be killed properly by systemd, but still if the files in $SNAP are removed then the application could still operate very oddly and I don’t think it’s a reasonable expectation for the application to gracefully handle the case where all it’s installation files are removed while it’s running.

As an aside, I think it’s safe to do only on removing since removing a snap isn’t (shouldn’t?) ever be an automatic operation.

I would put a different spin on this: removing a snap should fail if there are apps running. We may offer a --kill-running-apps option, or something of similar spirit, to enforce that. This is better in my eyes because it is less surprising than realising you were in fact running a snap application without knowing it.

1 Like

I think I agree with you about failing the removal if there are apps running, but to be clear, my order of preference would be:

  1. Fail snap removal if there are running apps
  2. Kill running apps during snap removal
  3. Don’t kill running apps during snap removal

I hadn’t realized that 1 was an option, hence I was arguing for 2 instead of 3 which is the worst option to me (and also the current situation).

I had a look at visual studio code, hoping to see what is the method of detection of background snap refreshes. This is what I found https://github.com/microsoft/vscode/blob/597d8da84a8f5c7263aa9fbe90984b35807a1b27/src/vs/platform/update/electron-main/updateService.snap.ts#L201

In short, every now and then they look at the target of the current symlink. As such, with the current mechanism Visual Studio Code won’t be able to detect updates because there are none that would happen. I’m thinking if this warrants a discussion about the grander role of the current symlink and reliability. We might be able to actually not use the current symlink at all for anything and still change it so that applications that choose to. look at it will see the “change”.

At the same time I’d much rather introduce a snapctl API call to allow apps to ask as well as a notification mechanism (either new hook or a simpler method that apps can easily integrate with filesystem notification services).

We discussed where to take this feature from here and here are some quick notes:

New changes for v2 (from top to bottom as they arrive)

  • add snapctl refresh-available that instantly (without talking to the store) tells snaps that a refresh is pending [1w]
  • add new lock that inhibits application startup during refresh process [2w]
    • the lock needs to be safe from unrelated errors - bound to process, bound to ephemeral file system object
  • add cgroup-based app termination mechanics to snapd (20/80 approach, simple polling until cgroups v2 make it easy) [2w]
  • add new UX for command line and GUI apps that displays refresh progress while that lock is
    held or while current is gone, and we are attempting to start the app
    • cli just shows the changes via snap socket [2-3w]
    • snap run sends signal to the session agent to display the UI and waits in the back for the refresh of the app to change [2+w]
  • session agent UI response for the signal [2-3w] (1w with zenity)

Interesting things but not for v2

  • add snapctl refresh-and-rerun that apps can use to tell snapd to actively refresh them after the app terminates
1 Like

We need to think a bit more about this, we do want a variant at some point that also talks to the store. That means two commands or a careful use of options.

We discussed that we do need a way for the app to ask snapd to take the lock before the app terminates otherwise it might be restarted before we get a chance to take the lock/start the refresh.

Current thinking is to have snapctl refresh-available --offline, to be true to that when we have a pending update because of running apps we should download enough to be able to proceed indeed even if “offline”, that means logic similar to what we have done now for remodeling.

I have had experimental.refresh-app-awareness set to true for a few weeks now, and it works as intended, preventing snaps from updating while the applications are running.

However I have more than once observed an issue which I believe is related: since chromium is running pretty much all the time, it never gets a chance to refresh itself, except when I reboot my machine. When I do and there’s a new revision, it gets refreshed, but if after rebooting I run chromium too soon, I find that the profile directory that’s stored under $SNAP_USER_DATA is incomplete, as if launching the app had interrupted an ongoing copy operation. I can tell that because my profile directory weighs more than 1GB, and when that happens the current profile is much smaller (usually a few hundred MB).

The workaround is to close the app, delete the new profile directory, copy the old one with the new revision number, and launch the application again. This is not user-friendly though.

Is this a known issue?

1 Like

This is a known issue. As a part of the new design, though this is not implemented yet, the application will not be allowed to start during the refresh operation.

It looks like experimental.refresh-app-awareness isn’t working for me. I have set it some time ago (have rebooted several times after setting option). But chromium snap updates while it running causing loss of changes made until I restart snap.

Option is set:
% sudo snap get core experimental.refresh-app-awareness
true

% ls -la ~/snap/chromium/current
lrwxrwxrwx 1 baz baz 3 Nov 22 13:11 /home/baz/snap/chromium/current -> 949

“current” link was relinked at 13:11, while I was using old version of snap until 15:32

% ls -la ~/snap/chromium/937/.config/chromium/Default/Login\ Data
-rw------- 1 baz baz 589824 Nov 22 15:32 '/home/baz/snap/chromium/937/.config/chromium/Default/Login Data'

What version of snapd are you on? Can you please paste the output of snap version.

% snap version
snap 2.42.1
snapd 2.42.1
series 16
ubuntu 19.10
kernel 5.3.0-19-generic

How long was chrome open for? We built a safety system into the refresh postponing logic so that at some time the application will still refresh.

Chromium was running for about 20-30 days

This is expected then. The current logic forces a refresh after 7 days.

Those numbers are not final. We are also more likely to gracefully signal this to the application and users alike as the feature progresses.

Is this the same behavior for daemons/service as well ? It seems my daemon gets refreshed when running even with the “endure” keyword used, should I opt-in for this experimental feature ?

Daemons and services are stopped for refreshes and restarted afterwards. If you want to keep a service running despite a refresh then you must use the refresh-mode: endure, as documented on https://snapcraft.io/docs/snap-format - I also noticed it is not documented on https://snapcraft.io/docs/snapcraft-app-and-service-metadata – paging @degville for a suggested edit.

2 Likes

Is there an ETA for marking the feature stable and making it the default behaviour?

I keep getting incoming bug reports where users are bitten by the chromium snap being refreshed while running, leading to profile corruption (all marked as duplicates of bug #1616650), and I suggest users to enable refresh-app-awareness, but it would be nice if this was on by default.

7 Likes

I’m doing my best but at the moment part of the feature is under review and I cannot proceed.

1 Like