Add support for service timers

jhodapp · June 20, 2017, 6:07pm

We have an existing customer that has a very real need to be able to schedule the execution of some utilities on a UC16 system. There’s no current way of doing that today and so proper support for systemd.timers is needed both in snapcraft and probably snapd as well. I can follow up with more specific customer use cases if necessary. This is something that would be needed in the next month or so.

niemeyer · June 20, 2017, 6:20pm

There’s actually an easy way to do that by simply having a daemon that does whatever it needs at the appropriate time. Should be easy to do that reusing existing tools that perform this task instead of writing something new.

That said, we definitely want to support timer units to be configured directly in snap.yaml and then snapcraft.yaml.

Should be pretty simple to introduce the feature from where we are. We just need to agree on how to specify these settings and then map it into the underlying system. Good topic for the sprint next week.

jhodapp · June 20, 2017, 6:27pm

Thanks @niemeyer for that explanation. Here’s a direct quote of what our customer would require which is pretty standard stuff:

We have a few use-cases, but they’re all either to run something on a schedule (once every min, for example), or on boot.

niemeyer · June 20, 2017, 6:28pm

Thanks for the use case. Yes, that’s a pretty common need and we definitely want to support it as a first class feature.

ribalkin · August 31, 2017, 8:09pm

Is anyone working on this feature?
We need it for syncloud.org

niemeyer · September 1, 2017, 7:17pm

@ribalkin Not yet, but we do need it soon indeed.

niemeyer · November 8, 2017, 11:27am

We’ll start working on this right after we fix the monthly support for refresh scheduling, and we’ll use the same syntax on both. Please have a look at the current syntax proposal if you’re interested in timer services:

mborzecki · January 11, 2018, 2:31pm

Just some ideas I wanted to fly by you guys. Basically we can do two things, either run the ‘service’ ourselves or, since we’re already integrated with systemd services, use systemd’s timer facilities. I would prefer going the systemd way as it basically delegates the task of tracking and running the thing to systemd daemon, basically something that it is good at… For the purpose of the ‘proposal’, I’ll take the second option.

Given a snap:

apps:
  app1:
    command: bin/app1
  app2:
    command: bin/app1 --some-operation
    timer-recurring:   mon,fri,0:00-24:00/24 # Mondays and Fridays, hourly

We would autogenerate:

# snap.foo.app2.timer
[Unit]
Description=snap app2 timer

[Timer]
OnCalendar=Mon,Fri *-*-* *:00:00

And the service it activates:

# snap.foo.app2.service
[Unit]
Description=snap app2 service

[Service]
Type=oneshot
ExecStart=/var/lib/snapd/snap/bin/app2

The potential issue is in translating our timer syntax to whatever systemd.time(7) describes. The syntax we have now is sufficiently expressive, hopefully this maps to a subset of what systemd calendar time specification supports.

niemeyer · January 11, 2018, 7:34pm

Indeed leveraging systemd is the best path forward as we don’t want to reinvent being a cron and dealing with all the potential issues that come with it (retries, failures, logging, etc), that comes with it.

I’m pretty sure we can map our syntax into the native syntax of systemd as despite the possibilities of our syntax, systemd’s is still more complex and comprehensive. That said, even if that turns out to not be possible, we can still leverage systemd regardless by just implementing a simple test command that verifies if now is a good time to run a given timer, and then call it from systemd’s timer itself as a condition to run the actual command.

For the syntax, I think we can go simply with “timer” for the field name.

mborzecki · February 1, 2018, 1:10pm

I’ve spent some time trying to figure out how to map our syntax to whatever systemd.timer(5) and systemd.time(7) support. I think we have a conceptual problem and only a subset of the supported syntax & functionality can mapped to systemd.

The main problem I see is that our syntax describes either discrete time points (eg. mon,14:00) or a time spans (eg. mon-wed,12:00-14:00). Then intention is that the action associated with the timer happens just once in the span (sub-span). For instance

mon,14:00 (actually mon,14:00-14:00) - the event happens once on Mondays at 14:00
mon,12:00-14:00 - once on Mondays between 12:00 and 14:10
mon-wed,12:00-14:00 - once on between 12:00 and 14:00 on Monday, Tuesday and Wednesday
12:00-14:00/2 (logically 12:00-13:00,13:00-14:00) once between 12:00 and 13:00, and again between 13:00 and 14:10
wed2 - once on the 2nd Wednesday, at 0:00
23:00-01:00 - daily, once between 23:00 and 01:00 the next day
23:00~01:00 - daily, once at randomly chosen time between 23:00 and 01:00 the next day
11:00-12:00 - if the event didn’t happen before and it’s 11:23, the even will happen now, next one on the next day at 11:00

For comparison, systemd.time syntax describes discrete time points only (assuming that we use OnCalendar=). The action happens on each event. To illustrate (note I’m trying to use similar times as above):

Mon 14:00 - the event happens on Mondays at 14:10
Mon 12..14:00:00 - the event happens on Mondays at 12:00, 13:00, 14:00
Mon 12..13:*:00 - there’s a an event each minute between 12:00 and 13:59 on Mondays
Mon 12..14/2:00:00 - even at 12:00 and another one at 14:00 on Mondays
Wed *-8..14 - once on the 2nd Wednesday, at 0:00
23..01:00:00 - does not parse
23:00 & RandomizedDelaySec=7200, at 23:00 + randomized delay between 0 and 2h
11..12:00:00 - if the event didn’t happen before and it’s 11:23, the even will happen at 12:00, next one on the next day at 11:00

I looked at the other [Timer] options, such as OnActiveSec, OnUnitActiveSec, but those don’t seem to be usable when mapping. OnActiveSec can be used to specify the delay from the last activation. OnUnitActiveSec is the delay from when the unit activated by the timer was last activated.

My feeling is that we will have to limit the timer services to specify timers that can be mapped to discrete events.

In other words:

support only single hour:minute (with exception, read on)
day spans though. spans wrapping around a week (actually all day spans) will be mapped to a list of days, eg. fri-mon -> Fri,Sat,Sun,Mon.
randomized time range, eg: 12:23~14:00 (I think it’s useful if the action would result in poking some remote machine)

Let me know your thoughts.

mborzecki · February 7, 2018, 2:45pm

Opened a PR with basic data types and minimal validation:
https://github.com/snapcore/snapd/pull/4633

mborzecki · February 13, 2018, 12:07pm

Current idea to get it working is to use a mix of systemd timers and snapd provided helpers. It would work like this:

snap foo defines a service timer:

# snap foo
apps:
  my-timer:
   command: bin/some-command
   timer: mon,10:00-12:00

upon instaling snap foo, snapd generates snap.foo.my-timer.timer which looks like this:

[Unit]
Description=snap.foo.my-timer timer
[Timer]
# calling generated service
Unit=snap.foo.my-timer.service
# try every 10 minutes
OnCalendar=Mon 10:00
OnCalendar=Mon 10:10
OnCalendar=Mon 10:20
OnCalendar=Mon 10:30
...

TODO: is 10 minutes good enough granularity for ‘trying’ to run?

snapd generates timer data file under /var/lib/snapd/timer/<snap>.<app>.json, the contents need to include timer spec (end consumer does not need to parse snap info to avoid dependencies and binary bloat)
snapd ships a /usr/lib/snapd/snap-timer
- snap-timer when run by systemd will parse data in /var/lib/snapd/timer/<snap>.<app>.json files, get the current timer spec, find out when the service last ran and start it if needed
- runs snap run <snap>.<app> as a child process
TODO how about snap run --timer <snap>.<app> which in turn runs snap run <snap>.<app> ? (probably easier do in current code).

snapd generates a snap.<snap>.<app>.service file for the timer service, eg:

[Unit]
Description=snap.foo.my-timer
[Service]
Type=simple
ExecStart=$(libexecdir)/snapd/snap-timer %i

if the timer is an interval, snap-timer can stay ‘running’ for as long as it’s inside the ‘active’ time, eg. timer is 10:00-11:00, systemd starts snap-timer at 10:00, it can stay running until 11:00.
if current time is outside of the timer spec, snap-timer will exit immediately
runaway timer policy is do nothing, we can’t tell how long the service should run, so it’s probably best to avoid killing it, perhaps log a message
reporting is not much different from the usual systemctl status snap.<snap>.<app>, systemctl list-timers will show when the snap.<snap>.<app>.timer (so the actual service that does the gatekeeping) got activated last and when it will be run next

niemeyer · February 14, 2018, 12:17pm

We probably don’t need that. If we use Persistent=true (which is a good idea either way) systemd will run once after a missed window, so we can simply schedule at the earliest time of the range (10:00 in the example) and expect systemd to call it on misses so we can check if we’re still inside the range.

We probably don’t need further data other than the time unit itself. Consider something like this:

ExecStart=snap run --timer <timer spec> ...

This would only run when the timer specification matches the current time, without any contact with the daemon or any other data file. For randomized windows, we can hardcode the random time inside the systemd timer itself. For example, for 10:00~12:00, when writing down the timer unit find a random time between 10:00 and 12:00 (minus some padding from the end), say, 10:37, and write that down in the systemd timer so it calls it at that time on that machine, until written again with a different time.

No need as well given those ideas, I think.

Given the strategy above, I’m hoping we can simply run the actual snap run command, and it bails silently if it’s not an appropriate time. Or perhaps if we can create good enough rules that wouldn’t fire improperly very often, we can even log something saying it’s being skipped.

mborzecki · February 14, 2018, 1:02pm

Right, this will make it a bit easier. Assuming this approach, the time ranges will be mapped as follows:

10:00-12:00 => OnCalendar=10:00
10:00-12:00/2
```
OnCalendar=10:00 
OnCalendar=11:00 
```

10:00~12:00/2

OnCalendar=10:37 # minutes picked randomly when generating the timer
OnCalendar=11:24

mon,10:00-12:00,,fri,23:00-01:00:

OnCalendar=Mon 10:10 
OnCalendar=Fri 23:10

I already have the code for generating *.service and *.timer files along with some tests, so it will get small update.

I also implemented snap run --timer albeit loading the information about timers from persistent storage. This will be simplified to just parsing the spec passed in command line and checking if we are inside the range (or close enough if the schedule is a single time, eg: 10:00).

Generating OnCalendar is not done yet and I’m using a fixed, every 10 minutes, schedule (*-*-* *:0,10,20,30,40,50:00).

Expect a couple of PRs in the coming days.

niemeyer · February 14, 2018, 1:58pm

Typo on the minutes?

mborzecki · February 14, 2018, 2:08pm

Yeah. Each time I write <hour>:00 there’a popup with some friendly/annoying icons. In particular is :100: Have not figured out how to disable the popup yet.

mborzecki · February 15, 2018, 1:40pm

More PRs:
https://github.com/snapcore/snapd/pull/4676
https://github.com/snapcore/snapd/pull/4677
https://github.com/snapcore/snapd/pull/4679
https://github.com/snapcore/snapd/pull/4680

mborzecki · February 16, 2018, 12:27pm

Generator for OnCalendar= entries. One niptick is that schedules such as mon1-tue2 are not representable, so we’ll have to count on the snap run --timer to not run the service if it falls outside of the range.

https://github.com/snapcore/snapd/pull/4695

mborzecki · February 16, 2018, 12:50pm

The last piece:
https://github.com/snapcore/snapd/pull/4696

mborzecki · February 27, 2018, 12:30pm

All of above is merged now. Obligatory spread test:
https://github.com/snapcore/snapd/pull/4758