Process lifetime ideas

mvo · May 8, 2017, 4:07pm

The current implementation of snapd will restart services that belong to the snap on refresh. This is important to ensure that the service of the new snap is actually running and to also ensure that no processes have pending file references to the previously used snap.

However for some snaps it is important to keep the existing service running. One example is the lxd snap that consists of two services: lxd itself and the lxd-container service that runs the containers that are supervised by lxd. On a normal refresh there is no need to restart all containers, it is actually harmful.

Another example is network-manager. On restart all interfaces will be teared down, so in many cases keeping network-manager running and restarting with the next core/kernel refresh is better.

To move forward with this, the proposal is that we:

add an option to the snap.yaml to allow services to declare that they should not be restarted
provide a mechanism in snapd to surface this information. It could be as simple as a notice after the refresh that a certain service needs to be manually restarted. And/or an additiional snap service needs-restart (or similar) command.

kyrofa · May 8, 2017, 4:31pm

If I have a service that currently has a file open in $SNAP (or $SNAP_DATA, etc.) and an update occurs that doesn’t stop that service, what happens?

In the case of $SNAP_DATA, where files are being copied around, I suspect I’ll start getting denials since the file being accessed is contained in a revision that is now old. However, I’m not sure what happens in the case of $SNAP, where bind mounts are in play. I suspect I/O issues. What is the plan there?

Actually… I guess the old snap isn’t unmounted, is it. So probably denials again. Until it’s been updated three times, at which time the oldest snap is removed, right?

niemeyer · May 8, 2017, 4:47pm

The kernel does not allow unmounting something that is in use. What we do in those cases is unmount with --lazy (MNT_DETACH) which basically means the filesystem is detached but open descriptors remain working, and the complete unmounting only happens once the filesystem is not in use anymore.

Yes, many problems may happen if the application isn’t prepared to do such live updating. That’s not unique to snapd, though, and we can’t help much without establishing requirements for the application development itself. That’s why it’s an opt-in mechanism.

One detail we’re still thinking about: in many cases that daemon needs to be restarted for it to be actually updated, but some of the child processes must remain running. That’s again not unique for snapd, and sometimes what is done to fix the problem is to move the child process under pid 1. That’s likely not how it will work for snaps. They should probably have a deamon that remains the parent of such workloads, and may be restarted. We need to think about how to make that process very easy to understand and implement.

jdstrand · May 8, 2017, 8:13pm

There are several issues with opt-in-no-restarting that need to be considered:

up until now, the design of the sandbox has specified that read/write access to the current revision is allowed but only readonly access to other revisions. when the AppArmor policy is reloaded, the policy applies to the running process so open files in the previous revision will now be readonly and writes will fail. This will happen even if the file was opened read/write. Because the open didn’t fail, applications have a reasonable expectation they will be able to write to the file
if the new revision has different plugs/slots, the seccomp and device cgroup policy won’t be applied until after the application is restarted as, unlike apparmor, updates to these can’t be applied to a running process
the SNAP variables in environ are not updated for the running process, as such, the new revision may be basing decisions (eg, file paths) using the wrong environ

We can of course change the apparmor policy to allow read/write to all revisions, but this has the potential to breakdown the rollback scenario where a new revision continues to write (potentially incompatible) files to the previous revision because of opendir()s that occurred on startup or examining environ. We could defer the reload of the apparmor policy, but this adds complexity as to when to perform the reload, breaks rollbacks and doesn’t work with new interface connections/disconnections.

I don’t think snapd can fix things transparently for everything to work and I worry about rollbacks. I think it may be possible to support in a limited sense when interface connections remain the same between revisions if the application is coded specifically for it and we provide some assistance. For example, if the snap opts into it not being restarted, snapd could provide a mechanism for the snap to read an updated SNAP environment and send a signal to the daemon to reload the SNAP environment, reload files, etc. The snap would likely be coded to use the common data dirs instead of revision specific (rollback integrity fully under the snap’s control then).

The problem with child processes is the same as the parent-- they all have the old environment, open files, etc and they are running under the same confinement as the parent, so the reload of policy, the problems with rollbacks, etc applies to them. For snaps that manage AppArmor for its children themselves (eg, libvirt with qemu, lxd containers, etc), then moving the processes to another parent should work because snapd isn’t reloading the security policy for those children (only the parent) and the children (likely) aren’t evaluating the SNAP environment variables. In this case it is important that the newly restarted revision of the snap by able to fully interact with the children that have been moved.

niemeyer · May 8, 2017, 8:19pm

Thanks for these insights, Jamie. Indeed there are some edges that will need to be understood and polished, and per notes above the application will need to be friendly to being kept alive while things are moving around it. Getting these details right will certainly take a few iterations, and we can use the well known cases we have to do that.

pstolowski · January 8, 2018, 11:14am

Can we go ahead with @mvo 's proposal with the intent of handling lxd/qemu case first? (I’m happy to work on it).

kyrofa · January 23, 2018, 8:20pm

Any updates on this?

jdstrand · January 23, 2018, 9:21pm

If a snap implements ‘1’, then the still running daemon will continue to operate with the old environ that points to all the old paths, that apparmor will block access to. If someone were going to move forward with this, how will these issues be addressed? I guess documentation could state that all writable paths should be in SNAP_COMMON or SNAP_USER_COMMON, but that still leaves issues with garbage collection. Imagine r1 has a daemon that is started. The snap refreshes through r2, r3 and r4 such that r1 is garbage collected, but the daemon in r1 is still running.

Put another way-- the proposal aims to fix lxd by introducing a generic yaml directive that any snap could use. Most snaps are strictly confined and will be affected by the old environ that points to other paths and the reloaded apparmor policy. lxd is not affected by this due to its use of the lxd-support interface. I wonder if we could test the waters on this feature by making it only available to snaps that plugs lxd-support. Once we gain some experience, we could do the same with docker-support or some future libvirt-support, or roll it out to a generalized directive.

Speaking of lxd-- libvirt starts guests in a way that libvirtd can be restarted, but the guests don’t have to be torn down and restarted and the restarted libvirtd can reconnect to them. It seems lxd should be able to do something similar. Instead of ‘letting daemons continue to run’ (which I’m not sure what that means from the proposal), we instead allow snaps to specify some yaml that makes the ExecReload command run on refresh. If the snap doesn’t specify this, then you get old behavior on refresh (stop/start), if it does specify it, then refresh does a systemctl reload. It would be up to the snap to grab (maybe via snapctl?) the updated environ and successfully re-exec with the new paths under the new apparmor policy. Perhaps this is what @mvo was thinking all along…

kyrofa · January 23, 2018, 9:24pm

Yeah, also remember one of the big benefits of versioned data is a slick rollback story. Putting everything in SNAP_COMMON or allowing write access to old revisions both defeat that to some degree.