How to manage services with sockets/timers

ijohnson · October 12, 2018, 1:50pm

Hi,

This topic is to discuss design decisions for services defined in a snap that have sockets and/or timers defined for them.

Current status

Sockets and timers are hidden from the end user in snap services (note: this will change with https://github.com/snapcore/snapd/pull/5959)
Services with sockets can’t have the socket stopped/disabled except with snap disable SNAP-NAME
Services with timers can’t have the timer stopped/disabled except with snap disable SNAP-NAME
Services and timers are unconditionally started/activated on install

Things to consider:

Should the end-user be able to manage the socket activation independent from the service?
If a end-user can manage the socket activation for a service, should a service that defines multiple services allow independently managing sockets?
Should the end-user be able to manage the timer unit independent from the service?
How does one go about disabling the timer part of a service but still run the service manually? (should this be allowed?)
How does one go about disabling the socket part of a service but still run the service manually? (should this be allowed?)
What parts/services should be started during install? (related question: if the timer/socket can be managed independently, what to do if the state between the service and the socket/timer isn’t consistent, i.e. one is enabled other is disabled?)

There may be other things to consider, I will update this original post to add those.

For clarity, I will respond with my answers in a reply to this original post.

Ping @mborzecki

ijohnson · October 12, 2018, 2:13pm

Yes, because a service might have other things that it does than just listen on a socket, i.e. some kind of service like that listens to instructions from some cloud to do things locally, but can also be controlled locally using a socket. You might want to disable the local configuration socket if it’s not secured via HTTPS, etc.

Yes, for example if a service has both a unix socket and a normal HTTPS socket (like with LXD), the sockets should be individually configurable if a password/certificates are not available then only use the unix socket for example.

Yes, but I think that the management of the timer should be limited; I think that the only sensible management actions for a service with a timer should be:

Run the service once, immediately (i.e. if the service is scheduled to run again in 6 hours, but you want to run it manually now)
Stop the current or next scheduled run of the service
Completely stop the service from ever running again until manually enabled

I don’t see how it would work to have a timer service that’s not effectively a “oneshot” daemon, so if there is a use case for having a non-“oneshot” daemon I would be keen to understand that use case better, in which perhaps it does make sense to have the timer/service completely independent.

I think this shouldn’t be allowed, either you have the service “activated” where the timer will start it up at the specified times, or the user can manually run it at some point, or the service is in a “disabled” state where it will never be run again unless the user changes that. I think the logical way to handle this is as follows:

snap stop snap-name.service-name → stop the currently running iteration or cancel the next one
snap stop --disable snap-name.service-name → stop the currently running iteration and disable all future ones so it never runs automatically again until the user says so
snap run snap-name.service-name → start the service up right now in the background and right now only (i.e. if it was disabled, leave it disabled just run the service itself once now)

This should be allowed, using something like the following (where the snap has a service, named service-name and that service has sockets named socket1, socket2, etc.

snap stop snap-name.service-name.socket1 → invalid, you can’t “stop” a socket
snap stop --disable snap-name.service-name.socket1 → disable the socket unit so the service is never socket activated. Note that this doesn’t change whether the normal service starts or stops running. That state should be independent from a snap end-user’s viewpoint.

This is difficult and depends on what one “expects” from services which seems to vary… This is what makes sense to me, in conjunction with my above points:

If a service without sockets and without timers is disabled, i.e. from a previous installation and we’re now refreshing the snap, or if the service was disabled during one of the install, etc. hooks then we should leave that service disabled.
If a service without sockets is enabled, then we start the service normally
If a service has socket(s), and the socket(s) unit was disabled as shown above, then we don’t start the socket(s) activation part. At the same time, we independently look at the status of the service itself, not the socket(s), and if it was disabled, we don’t start the service. [[NOTE: currently using systemctl is-enabled service doesn’t work if service also has a service.socket unit, we may need to track this state independently of systemd]]
If a service has a timer, and the service itself has been totally disabled via snap stop --disable, then we don’t start the timer or the service itself.
If a service has a timer, and the service itself hasn’t been totally disabled, then we activate the timer portion of the service and don’t manually run the service itself.

ijohnson · October 12, 2018, 6:38pm

One other tangential point to consider here is that currently services are unconditionally enabled as part of a refresh here: https://github.com/snapcore/snapd/blob/master/wrappers/services.go#L251

which means that if I do sudo snap --disable snap-name.service-name and then the snap gets refreshed, snap-name.service-name will get started up again. I’m not yet sure if this is intended behavior, but my initial impression is that this shouldn’t happen. I wrote a spread test for this here: https://github.com/anonymouse64/snapd/commit/0038b02c35eab28af225632bf90eeecaee8e7413
(which obviously currently fails)

ijohnson · December 18, 2018, 10:20pm

@mborzecki have you had any chance to look at this? I have just started working on a snap which will need to be able to disable a service from the install hook that will also have unix sockets…

mborzecki · January 9, 2019, 7:49am

Somewhat related to this topic Command line interface to manipulate services - #47 by pedronis.

I think the last idea was to give some means of aggregate management of services (i.e. the service itself and trying to do the right thing about its socket and timer). Probably needs some discussion in the team, or at least getting @pedronis involved.

This indeed looks a bit off. We do not track whether the user disabled a service from a snap. However, I think that any attempts to track the state of services before the refresh and restoring it when enabling the new revision would need to work under the assumption that service names from the old revision are still valid in the new revision. Doing this automatically feels a bit fragile, given that snapd does not know whether the assumption is true for any snap. I would say that this is the job of pre-refresh and post-refresh hooks. The snap creator knows best whether, say service A from rev 1 is the same A in rev 2 or it is rather named B now and the hooks can act accordingly.

This brings us to the 2nd part of the problem. The output of snap services is sufficient to do the right thing in the hooks for the regular services. However, it lacks information about the state of sockets and timers. There were some proposals in the topic I linked above. I think this one could work:

ijohnson · January 14, 2019, 11:27pm

Your proposal makes sense but to make sure I understand it correctly you want snapd to not do anything special with services on refresh and instead require the post-refresh and pre-refresh hooks to handle things like disabled services?
If this is the case, that’s fine but the following things would need to change:

There would need to be some sort of programmatic way to query snap service status from inside a snap. Currently snap services doesn’t work from inside a snap, and requiring a service to plug snapd-control is wrong. I think adding something to snapctl makes the most sense here, but I would like it to explicitly be a programmatic manner as we have written scripts (which run outside of snap confinement) that have broken before because the formatting of snap services was modified and I think we should provide a way for snapctl to work with this that is easily machine-readable and stable.
Failing to start snap services can’t trigger a snap refresh to be aborted, or the post-refresh hook would need to be run before the services were started. This is because if the post-refresh hook runs after the services are started, and a service was disabled because it can’t start before the refresh, then the hook wouldn’t have gotten a chance to disable the service before snapd goes to start it. Then the service fails and the refresh is aborted. I’m not sure of the implications of running the post-refresh hook before starting the services (I’m pretty sure that’s when it’s run now, but not 100% sure), but it seems like it could be wrong.

ijohnson · September 2, 2019, 12:35am

I’ve started looking into this and will be working on fixing this situation and related issues we have with snap service management.

To summarize again,

Sockets

We currently have no way to disable a snap service’s socket activation using snap/snapctl (you have to use systemctl) - this is bug https://bugs.launchpad.net/snapd/+bug/1842259
We should have a way to selectively disable some of a snap service’s sockets so that a snap can have multiple sockets to socket-activate itself with, and only have some of them active and running (see Command line interface to manipulate services for example)
There is currently no way for a user to tell using snap commands what sockets a snap is socket-activable from (this should probably be exposed via snap services and snap info --verbose somehow) - this is discussed somewhat at Command line interface to manipulate services

Timers

We currently have no way to disable a snap service’s timer activation using snap/snapctl (you have to use systemctl) - this is bug https://bugs.launchpad.net/snapd/+bug/1842258
If a snap service gets activated by a timer, you are unable to stop/disable it while it is running using snap stop - this is https://bugs.launchpad.net/snapd/+bug/1842257
There is currently no way for a user to tell using snap commands when a service will next run via a timer (this should probably be exposed via snap services and snap info --verbose)
(nice to have) There is no way for a user or a snap (via snapctl) to modify it’s own timer specification

ijohnson · September 27, 2019, 2:03pm

As discussed in Paris, we will do the following in the short-term:

snapctl stop --disable will disable the service and also disable all sockets and timers associated with a service
snap info will show in the services output that a service is socket activated (like snap services does today)
snap services svc --time will show the next time that the service will run if it is on a timer (possibly also include this in snap services --verbose when that becomes a thing)
for snap service timers, we will continue to allow the old ambiguous syntax around anchored days, but the review-tools will gain a warning for using the old ambiguous syntax (and the old ambiguous syntax lose it’s ambiguous meaning and start working like the new behavior)

In the long term we may consider:

adding snap refresh --calendar which shows when the next refreshes for various snaps will happen
adding a snap status command for all the dynamic properties of snaps like health, services, running processes/windows, etc.