It's a little bit hard to use `daemon-notify` for sd_notify

zyga-snapd · July 12, 2018, 8:37pm

While fixing some tests I realized that the racy nature of a service was caused by the concept of service readiness. Systemd considers a service ready when certain conditions are met. When a service is ready things like systemctl start foo.service return, other services may start, etc.

When a daemon uses simple (as in daemon: simple) mode the service becomes ready as soon as the process is alive. This is usually okay but, well, not quite in practice. If a particular system service is really designed to wait for a certain UNIX signal to arrive it may be considered ready before it manages to establish signal actions for the set of signals it expects to handle.

This issue leads to racy behavior and has lead to the design of the sd_notify(3) mechanism whereas a service may optionally let systemd explicitly know that it is ready. This is a simple and effective idea, solving the readiness issue in general. The mechanism has also other uses that I won’t go into here.

For some time now snapd has a snapd interface called daemon-notify that, once connected, allows a service to interact with systemd over the said socket. Because of some past security issues where sending garbage over that socket could bring the whole system down the interface is not auto-connected today. We need to perform a security audio of systemd or consider the possibility of using snapd as a proxy (if such a proxy is at all possible). The interface can still be connected by the user, if desired, as with several other interfaces that are not blocked but also not auto-connected.

The problem with this arrangement is that it completely prevents the development of snaps using daemon: notify. When systemd is told to expect a notification message it will fail to start the service in absence of said message. This will in turn cause the snap not to install and thus make the manual connection impossible.

In theory it is possible to develop a snap in devmode, upload it to the store in strict mode, get an assertion granting the auto connect so that the daemon-notify interface connects on installation but this is perhaps a little bit difficult in practice.

This post is here so that we know the problem and remember about it. There are two possible ways forward mentioned above: security audit and auto-connection by default or a proxy mechanism. There are also some bigger topics we could explore like making it possible to indicate a given interface connection is required to use a snap (e.g. leaving the snap as installed but inactive or in some new similar state that we don’t have a name for yet).

ijohnson · October 4, 2018, 12:43pm

Note that one way this could be handled is to make an install hook which disables the service on install, then connect the interface and start the daemon.

zyga-snapd · October 4, 2018, 12:52pm

Yes, coupled with the PR that fixes snapd not trying to start disabled services.

kyleN · May 12, 2020, 9:17pm

(@ijohnson too.)

I am hitting an issue (or misunderstanding) with a notify type daemon with daemon-notify plug with a second daemon that is ordered after.

There are two daemons (see apps below):

notify: notify type that uses daemon-notify plug and does send the sd_notify("READY=1") sig after a while (this part works).
notify-follow: oneshot type declared to be AFTER notify.

Since I don’t have a snap-declaration assertion to auto connect daemon-notify, I use an install hook to stop the both daemons.

snapctl stop --disable ${SNAP_NAME}.notify
snapctl stop --disable ${SNAP_NAME}.notify-follow

First issue: It is not enough to omit --disable in the install hook: It still runs on install unless I add the --disable flag. Is this by design? (so I add the --disable flag).

I also have a connect-plug-daemon-notify hook that successfully starts the notify daemon on interface connect:

snapctl start --enable ${SNAP_NAME}.notify

After install, I connect the plug:
$ snap connect test-notify:daemon-notify

And the notify daemon runs.

Second problem: The notify-follow daemon does not run. Maybe because it was disabled? (which, as I noted, I had to do or it ran on install even though it is listed as ‘after’ the notify daemon).

apps:

  notify-follow:
    daemon: oneshot
    command: bin/daemon-notify-follow
    after: [ notify ]

  notify:
    daemon: notify
    command: bin/daemon-notify
    plugs: [ daemon-notify ]

Repo: https://github.com/knitzsche/test-notify-issues

ijohnson · May 12, 2020, 11:41pm

yes

what you are saying is that you want a different daemon to be started after your hook manually started some other daemon, and there are two problems with that (well at least 2).

you only requested to start one of the daemons, how would snapd know that we should start the other one? or rather, if we did auto-magically start the other one because there’s an after between the two, how would a user specify that they just want to start a single daemon without starting all dependency daemons? basically if what you want is to start multiple daemons then you need to ask snapd to do that
due to bugs, we don’t currently support this very well with snapctl, specifically it is racy because systemd is racy here.

For 1, what your hook should really be doing is to use snapctl start --enable $SNAP_NAME.svc1 $SNAP_NAME.svc2 to start both services together, then the after spec comes into play and does the right thing. The after spec as currently implemented/designed only really defines what happens to the order in which the services are started when they are started together, i.e. on snap first install and on reboot, or when you run snap restart, etc.

However, just calling that from snapctl brings us to 2, bugs. We currently have a bug in snapctl that prevents this from working (I don’t think we ever filed a proper LP bug about this, but if you wanted/could do that, much appreciated). It currently doesn’t work because snapctl in effect calls systemctl start <list of services> which is buggy and doesn’t obey ordering and apparently this is by design because $SYSTEMD_KNOWS_BEST. However, @stolowski is working on making this work properly by refactoring the snapctl start ... code to work the same way as snap start ..., because currently snap start IIRC will actually build the ordering tree of all the services and call systemctl start <single svc> on all services in the right order. However, even in this case I don’t know if this is the right thing for your setup to do, because iirc we will wait for oneshot daemons to exit to consider them “started”, so if your oneshot daemons take a long time to run or do whatever they are supposed to do, this could timeout and fail, specifically in the context of snapctl, and also because we have designed hooks to be relatively quick things and so if your hook is taking 10 minutes to execute it’s not working the way we intended hooks to work.

All that is to say, the best thing for you is probably just to get auto-connect for your usage of daemon-notify, and meanwhile for development you can 1) run in devmode, 2) leave in the snapctl stop --disable in your install hook and just manually connect the interface and then start the services with snap start <list of svcs> and in both cases leave out the interface hook.

jamesh · May 13, 2020, 1:22am

That’s not quite what systemd’s After= configuration does. Instead, it is “if these two daemons are both to be started, start B after A”. It doesn’t cause the other unit to start like Requires= or Wants= do.