FYI this is a development post about snapd internals and isn’t probably interesting if you don’t work on snapd itself…
Problem statement
Currently, we have at least 3 situations where snapd code will interact with systemd to control services/daemons.
- When installing/removing snaps and we have to standup the systemd units and start services for the first time (and stop them when it comes time to update).
- When a snap user is managing services and runs something like
snap stop
. - When a snap itself (i.e. the snap author writes this) wants to manage it’s own service and runs something like
snapctl stop
.
The big problem is that while the codepath for 2 and 3 is pretty much the same (they both end up back in servicestate overlord and create service-control
changes with exec-command
tasks), the codepath for 3 is completely different and goes straight to the wrappers/systemd package without creating any sub-tasks, etc. This means we have tasks like stop-snap-services
in the snapstate manager which operates in a completely different manner from service-control
in the servicestate manager w/respect to how things are told to stop.
There is a desire to unify the implementations to make them follow mostly the same codepath, at least when it comes to working with systemd using the systemctl
command because of various wrinkles in working with systemd. Exacerbating this situation is the disparity in how the 2 methods are tested, for one we have a systemctl
function which is run in Go and needs to handle the args passed to it via Go function mocking, while for other tests we have to mock systemctl
as a shell function on the system because running that binary is hard-coded inside wrappers/systemd package.
Status Quo
Externally, mechanisms 2 and 3 will create a service-control
change with exec-command
tasks with the systemctl binary + args just passed in literally to the exec-command
task. This is unfortunate because we already have many high level functions in the systemd package which handle this in a more high level and are more consistently mocked, etc. Mechanism 1 however has a few different changes (i.e. install-name
, refresh-name
, etc.) that create stop-snap-services
or start-snap-services
tasks which call into the aforementioned systemd package. These tasks are fine right now AFAICT.
Deep dive into the codepaths
This can be skipped if familiar with the code, but is here for posterity
Specifically for 1, in cmd_snap_op.go, cmdInstall.installOne
will call into client.Install
, which eventually posts to /v2/snaps/<name>
on /run/snapd.socket. From there the daemon picks up in daemon/api.go with postSnap
and eventually will get to snapstate.Install
, which after running around a bit will call stopSnapServices
which itself will go call backend.StopServices, … which snakes down into wrappers.StopServices
and finally builds up the systemctl args using wrappers/systemd.go
Specifically for 2, in cmd_services.go, svcStop.Execute
will call into cli.Stop
, which posts to /v2/apps
on /run/snapd.socket. From there the daemon picks up in daemon/api.go with postApps
and will work next into servicestate.Control
which finally will build up the systemctl args and run them with cmdstate.ExecWithTimeout
.
Specifically for 3, in cmd/snapctl/main.go, run()
will call into cli.Runsnapctl
which posts to /v2/snapctl
on /run/snapd-snap.socket. From there the daemon picks up in daemon/api.go with runSnapctl and will work next into ctlcmdRun
, which then winds back into stopCommand.Execute from ctlcmd package, and finally ends up at the same place with servicestate.Control
and then follows 2 from this point.
Proposals
I propose the following steps to alleviate the situation:
-
service-control
changes no longer createexec-command
tasks and instead createstart-snap-services
and/orstop-snap-services
tasks (possibly with enable a sub-case ofstart-snap-services
and disable a sub-case ofstop-snap-services
) -
start-snap-services
andstop-snap-services
start tracking internally to the task what services are to be started/stopped, etc. (implementation detail: if this isn’t specified in the task, then do the right thing and assume “all” with the current behavior so that we can leave as much as possible in snapstate_test.go well enough alone ) - Refactor relevant tests to use “the one way” to mock systemctl as needed (probably the mock shell script way I think)
Another alternative which would probably be more work would be to:
- Remove the
start-snap-services
andstop-snap-services
tasks altogether, replacing them with a (new) kind of task calledservice-manage
or something specific to managing services - Refactor the
service-control
change to use this newservice-manage
task
There may indeed be more alternatives, but this is what I’ve come up with.