This is a quick dump of my thought process:
Design
Snapd gains the ability to postpone the critical data copy that happens during refresh, until all processes associated with the snap being refreshed are terminated. Snaps gain ability to either allow app startup during pending refresh or block new apps from being started. Reliable process enumeration is done using the freezer cgroup available in cgroup v1. Reliable cgroup emptiness is done using release-agent protocol from cgroup v1. We are not considering cgroup v2 yet, since the freezer controller is unavailable yet but in v2 the logic has simpler and more reliable equivalents.
EDIT: the point of the .running
file is to couple with inotify
to get as-it-happens notification when a refresh should happen. Without that we would need to retry often to catch a moment where we can refresh.
Future work:
- Add a
refresh-pending
hook, to be useful it should be able to run in user session and notify user applications (e.g. Firefox update notification).
This is left out because I believe for desktop applications it would need to be a new type of hook, a user session hook, which we don’t support at this time. - Perform refreshes during early startup, while machine boots, before services are started, assuming the refreshed snap is downloaded before.
Changes per component
Changes to snap-confine
- If
/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.inhibited
is present then wait until the file is removed- we may print a message if PTY is present
- After populating the freezer cgroup write
/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.running
snap-cgroup-v1-release-agent
- New C binary, optimised for size and speed
- Follows the cgroup-v1 release agent protocol:
- takes single argument, cgroup path relative to mount point of control group
- if cgroup name matches
snap.*
then unlinks/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.running
Additions to snapd snap manager
- On startup, the snap manager registers a cgroup v1 release agent on the freezer cgroup. If this fails a flag is stored in the manager that agent is unavailable.
- Offer function and event-based check if a given snap is running:
- use inotify on
/run/snapd/runctl
if release-agent and inotify are available - fall-back to polling on
/run/snapd/runctl
if release-agent is available but inotify is not - fall back to polling on
/sys/fs/cgroup/freezer/snap.$SNAP_INSTANCE_NAME/cgroup.procs
if release-agent is not available
- use inotify on
Changes to snapd snap manager
- Refresh tasks gain new task
wait-for-apps
that runs before data copy is done and:- if
/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.running
is present, postpones the task - if event based notification of cgroup vacancy is reported, re-schedules the task immediately
- ensures that new apps cannot be started by writing
refresh-pending
to/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.inhibited
.- this may be controlled with
refresh-pending-behavior: block | allow
syntax, defaulting toallow
, this is distinct from therefresh-pending
hook that is postponed in the initial implementation.
- this may be controlled with
- if
- Before restarting services after linking the new revision, unlinks the
/run/snapd/runctl/snap.$SNAP_INSTANCE_NAME.inhibited
file
Behaviour on revert of snapd
When snapd is reverted to a version that doesn’t implement this spec and the system reboots, the ephemeral state in /run
is lost so it plays no further part. The new wait-for-apps
tasks will be ignored by the catch-all task handler, returning to previous refresh logic. We may need some help in case system does not reboot.