Expose a more consistent subset of systemd's service directives

Just to update y’awl on where we stand, we’re currently struggling with Wants/Requires/PartOf/BindsTo, trying to find a sane way to expose these. Our problem with the names systemd uses for them is that we find it impossible to remember which directive has which exact semantics, and don’t want to pass this same feature to the snapd world, if at all possible.

Before and After are fine, and clear, and I think uncontroversial.

Conditions, yes we should expose all the relevant ones. Probably export it as a conditions map inside an app, to keep it neat, and probably implemented using the AssertFoo instead of the ConditionFoo, so we have logs (let us know if there’s a reason not to do things this way).

Environment is already supported; we don’t think there’s value in additionally supporting EnvironmentFile, unless there’s an actual use case for it where the file lives in the snap’s data directory and is modified? It sounds rather far-fetched, so let us know if this is the case. If you’re using it to read a file in the snap, snapcraft should be able to pack an environment file into environment stanzas (but this isn’t done yet). If you’re using it to read a file outside of the snap entirely, we wouldn’t want to support that (at least not without a pertinent interface, and that would need further design).

Conflicts and PropagatesReloadTo fit into the bigger conversation we’re having about Requires and etc.

@hcochran is PartOf really useful on its own in practice, given that you can’t really define targets? Or would you also need targets for it to be useful? (I don’t think we want to allow targets – but on the other hand if a dev needs them they’ll end up simulating them with an app that does nothing, so maybe we do).

Is the issue just the naming convention? I can understand the desire to simplify wherever possible, but coming from using systemd it would be likely be more confusing to have to remember the mapping for systemd->snap command names, then when attempting to debug the service remembering the opposite direction to remember what to change in the snapcraft.yaml file to modify the service. For a more complex system a transparent passthrough would likely be much less confusing.

That’s pretty close to the situation we are in. We are adding the targets ourselves after the snap installs. This is currently one of the things preventing us from moving to a confined snap. What is the reason for not wanting to allow targets?

1 Like

Systemd is extremely well-documented and is a defacto standard in the Linux world (previous controversies notwithstanding). Therefore, I believe the meanings of these things will only become more widely known with time. I strongly agree with cratliff that mapping systemd names to a different set of names would exacerbate, not help the problem. (This comment applies to the naming of the Condition* directives, although I am unaware how renaming them to Assert* is connected with logging.)

I also think that WantedBy, Wants, RequiredBy, Requires are in very common use with meanings that are fairly intuitive. That leaves BindsTo and PartOf which, while less obvious, are also very useful. In fact, I think BindsTo is basically essential for any product that has hardware which may come and go dynamically and for which you need associated software to start and stop when this happens. It seems to me that this would apply to many embedded, robotics, & IoT-type devices.

The way it works is this:
Gadget snap would install a udev rule like this one:

SUBSYSTEM=="usb", ATTRS{idVendor}=="BEEF", MODE:="0666", SYMLINK+="AwesomeCamera", TAG+="uaccess", TAG+="udev-acl", TAG+="systemd", ENV{SYSTEMD_ALIAS}="/dev/AwesomeCamera"

This causes a dynamically-generated systemd unit called “dev-AwesomeCamera.device” to activate whenever this USB device appears and to deactivate when it goes away.

Then, we have a service (in our case a ROS node) that will start and start automatically when this device appears or disappears by adding this to its service file:

[Unit]
BindsTo=dev-AwesomeCamera.device

Without BindsTo=, we would have to use some out-of-band way to notice the removal of the device and stop the corresponding service. This involves some wheel reinvention and may even require polling.

Does that clarify how needed this may be?

Thanks, very much, for considering our feedback.

1 Like

This is exactly the case that we have. We have a config file, for e.g. $SNAP_DATA/ros.env, that sets some environment variables like this:

VAR1=value1
VAR2="other value"

These configuration values vary from machine to machine. Our snap install hook creates a default version of this file which can be modified later, either by hand editing or by downloading a different configuration from our fleet management software.

Most of our configuration files are .yaml also loaded from $SNAP_DATA. But some features of the upstream software we use (ROS) can only be influenced by environment variables.

While I think EnvironmentFile= is important, we can at least work around this using a custom wrapper script for all of our services. While messy, the possibility of a non-confinement-breaking workaround makes this directive less essential, for us, than some of the others under discussion (i.e. all of the depedency & ordering directives, Conflicts, Condition*)

Thanks

We only use PartOf= to specify that a service is part of a target, when this directive is important to make a target both start and stop dependent services (rather than only start them, which is the default).

The concept of targets are very important when a large snap represents basically a “whole system” rather than a single application. Such may typically be the case when the system is an IoT, embedded, or robotic device rather than a server or desktop computer. For devices like ours, there are major modes that cause components of the system to start or stop together. For us, targets include this like:

  • initial-configuration.target - robot navigation and application-level software disabled; contacting fleet management for configuration
  • maintenance.target - robot navigation and application-level software disabled; ROS nodes for calibration and diagnostics are launched
  • robot-base.target - robot navigation running but application-level software disabled
  • normal.target - Full application stack is running. This one Requires=robot-base.target but would Conflict=low-power.target and Conflict=initial-confiration.target for example
  • low-power.target - Starts a service which commands certain hardware to power off (such as a USB hub). Anything that uses lots of power Conflicts= with this.

OK, hopefully that illustrates a good case for targets. However, it is possible to work around the lack of targets using “empty services” as you mentioned, meaning that we can get what we need without breaking confinement even if snap does not support targets.

However, we cannot get what we need without PartOf= (at least without walking down a wheel-reinvention road where we use scripts to try to do what systemd would have done for us).

Therefore, PartOf= is more “essential” for us than explicit support for targets, even though its main use case is in the context of targets!

Thank you so much for soliciting and considering our feedback. Ubuntu Core + snaps are very useful!

The idea there isn’t being different just for the sake of being different. In many cases we kept the pristine names of systemd, as for example in the socket activation feature that we’re implementing support for we’ll have listen-stream. It wouldn’t be my first choice of terminology, but we preserved it precisely to help mapping between the two worlds.

But the case here is different, in the sense that systemd has very poorly designed terminology around those ideas. Just consider:

  • Requires
  • Wants
  • BindsTo
  • PartOf
  • PropagateReloadTo
  • ReloadPropagatedFrom

All of these are essentially about starting and stopping services on certain events. Yet, we have mixed up the ideas of binding, parts, propagation, soft and hard dependencies, and so on. And this is not only very confusing, but it also burns down terminology in the sense that once we have a part-of term in the stanza with a given meaning, for example, a proper design needs to burn the term because we don’t want multiple uses of that next to each other.

Those terms are also not self-describing, and I commonly see experienced people digging down in the documentation to re-learn what they mean, even for the better cases such as Requires. See for example this question on stackexchange which was viewed six thousand times since it was asked 11 months ago. There are many of those.

So… here is a proposal:

Let’s start with a small set of options that covers the common cases we know about in terms of inter-dependency between services. Some good candidates we discussed:

  • starts-with: <other> | [<other>, …]
  • stops-with: <other> | [<other>, …]
  • runs-with: <other> | [<other>, …]

Update: As noted much later below, it’s been a while but I’ve been thinking of “starts-with” and “stops-with” for reasons unrelated to snapd, and we’ve settled for “requires” as the term, as it makes the directionality of the dependency clear, and the implied semantics are easier to grasp.

The first two will be appropriately mapped to Requires, Wants, or PartOf as appropriate to obtain the intended semantics. The last one, runs-with, leverages BindsTo, which unfortunately is the only option that includes the idea of exiting together with another service’s self-termination. Wasn’t for that, we might also have an independent exits-with option.

Then, as a follow up step, let’s look into how to properly map the support for targets into snaps in a way that is both safe and convenient. We probably won’t need new terminology for that, other than perhaps something to define the targets themselves. We do need to consider the issue of confined interaction with the system, and whether we want snaps to operate on a global namespace or individual namespace, and if global, how maintain sanity on that namespace.

In either case, from what I understand on your description, all of those issues are points we want to make work, so thanks for engaging with us and let’s push it forward.

1 Like

Hey all! This thread has been an excellent discussion, and I want to ensure it continues beyond the rally. Have we managed to make any headway, here?

I’m slowly breaking it down. I’ve got half of the conditions PR done, but it’s stuck behind some higher priority work for now.

1 Like

Thanks for the update, @chipaca :slightly_smiling_face: .

Just wanted to second the point made by @morphis earlier in the thread about WatchdogSec= being important. We’ve been asked about future (ie. 18.04) support for application based watchdog timers by one large OEM in particular.

@chipaca will your initial work include After= and Before= conditions? Also any thoughts as to a potential version this of snapd this will land in?

1 Like

+1 to service ordering as well. I have a use case that where a snap has an mqtt broker service and a number of services that connect to the mqtt broker. Would be great if I could start the dependent services only after the mqtt broker service is started.

Work on this feature is already under way. It will be part of the next release or the one after it in the worst case.

Great, I also +1 the request for watchdog support. Currently if a start rate limit has been reached for a service, the service is put in a permanently failed state and no remedial action is possible. I’d love the option to be able to reboot the system under this scenario. Such a feature is particularly important for unattended consumer IoT applications.
The current workaround of editing the unit files through a configure hook during installation obviously breaks confinement as @hcochran has noted.

Took a quick stab at watchdog support https://github.com/snapcore/snapd/pull/4504 It seems that due to security concerns this will require some additional work though.

1 Like

Is there any branch with these systemd improvements (post command, start timeout…)?

Came here to ask the same; is there a rough timelines set for implementing these? Took a look at The snapd roadmap but this doesn’t seem to be mentioned on there.

We’re still working on these improvements, and pieces have been frequently landing. We got before and after in, and @mborzecki is now working on service timers which I’m really looking forward to having as well.

It would be nice to tackle starts-with and friends next, but if the lack of something else is blocking you, please let us know the details and we might change the priorities.

Just curious on status ie any follow up to @svet and @ribalkin? Thanks!

FYI, service watchdog support has landed in master. The service can specify desired watchdog timeout by adding watchdog-timeout property in its declaration:

name: foo
version: 1.0
apps:
  i-want-watchdog:
    command: bin/app
    daemon: simple
    watchdog-timeout: 1s
    restart-condition: never
    plugs: [daemon-notify]

As the watchdog is actually driven/tracked by systemd, the service needs access to systemd’s notification socket. This access is provided by daemon-notify interface, which needs to be listed in the plugs section. Since there were some reliability related incidents regarding the notification socket in the past, the interface is not auto connected and needs connecting manually.

@mborzecki, hi!

Is that setting supposed to be used in snapcraft.yaml?

I added watchdog-timeout setting to an app and snapcraft started to fail with the following error:

Issues while validating None: The 'apps/ping' property does not match the required schema: Additional properties are not allowed ('watchdog-timeout' was unexpected)

snapcraft version is 2.42.1.