Allow disabling system services on Ubuntu Core

ogra · May 10, 2017, 2:06pm

Today UbuntuCore images allow you to disable the sshd service via the core config hook …

back when we implemented it we did that with the idea in mind to also allow other system services to be managed by that function …like being able to disable timesyncd (link to bug) when you install an ntpd snap that uses the same resources, turning off rsyslog (which we mainly ship for remote logging capabilities, actual looging is done by journald in a ringbuffer in ram) (link to bug), (another related bug)

we have a PR sitting here that is blocked since a while with the remark “needs further discussion”, this topic is trying to start such a duiscussion so we can move forward with the implementation.

niemeyer · May 10, 2017, 2:16pm

When we added support for ssh, we very explicitly discussed that we need to be careful on which services we hand off control. The reason isn’t simply being in control, but rather that those are implementation details that the system will rely on and may break in strange ways during future updates if every device has a different behavior. That’s why we rolled back the changes you did without discussion and reviews, and that’s why I invited some discussion on the topic before landing that PR.

So, with that background out of the way, let’s see the specific cases.

syslog

Why do we have that on today? The reason people are asking for it to be disabled in the local system seems to be a good reason to have it always disabled. We don’t want systems dying because their disks are full of logs, and we already have a daemon that does exactly the right thing, offering access to the latest logs.

timesyncd

The bug you link to is asking for the NTP server address to be manageable, which is sensible, not for it to be disabled, which is dangerous. Ubuntu Core systems rely on signed certificates which have timings associated with them in multiple cases (TLS, assertions, etc). A system with a broken clock that can’t ever synchronize its clock will very likely be bricked until it’s logged into.

ogra · May 10, 2017, 2:39pm

journald does no allow to easily hook into a central syslog server which is something you most likely want for IoT devices. it also only keeps logs since the last boot if it does not write to disk (either via syslog or through its own mechanism that we currently disable). i’m not opposed to turn off syslog by default though and simply add an option to turn it on as well as an option to point to a remote logging server.

if we are that security focused we should probably drop timesyncd altogether and ship ntpdc or tlsdate by default instead (we initially only switched to timesyncd because it comes free with systemd and is installed anyway).

ntpdc can use TLS handshakes and has additional features to verify the correctness of the time that timesyncd does not provide.

tlsdate goes even further and uses only the TLS protocol directly to sync the time from the TLS ServerHello and ClientHello functions (that way we could directly sync against the store without even involving additional (potentially insecure) ntp servers)

niemeyer · May 10, 2017, 2:44pm

Okay, so let’s just disable syslog by default right now, and put appropriate remote logging into the backlog.

For NTP, I like the idea of using timesyncd because it’s already being shiped by default and seems relatively simple, but someone that really understands all the problems in depth needs to pick the right one for us, and we should ship it enabled by default, and prevent it from being disabled. We also need to accept changing the NTP server being used via DHCP, and via an explicit setting in the core, but right now I’d prefer to not allow disabling it.

I don’t think NTP over TLS is so straightforward a choice, given that TLS itself depends on timing and proper certificates, but again someone that understands the details more than I do needs to provide feedback before we take a decision.

ogra · May 10, 2017, 2:52pm

should we do that via the core config option (and set it to disabled by default) ? the only other alternative i see is to call “systemctl disable…” during build but that means we wont have a way to manage it through the unified core config (makes it harder to enable again via a gadget setting then).

i think thats is actually the tempting part about tlsdate, you have a guarantee to be in time-sync with your certificate server this way.

niemeyer · May 10, 2017, 3:01pm

If we’re disabling syslog by default, it’d be better to ship it as a snap instead of having it in core.

I’m sure tlsdate also has dozens of issues, such as the fact you need to trust the server to have a proper clock time in its handshake, and the fact internal infrastructure will not use TLS at all. Someone that has more experience than you and me in the real world of clock syncs needs to chime in on what’s the best NTP approach here.

ogra · May 10, 2017, 3:07pm

ok, i guess then we need to at least document that you need to create /var/log/journal to get logs across reboots (else the ringbuffer gets flushed)

zyga-snapd · May 10, 2017, 3:08pm

Sounds like a core config option “persistent journal” to me.

ogra · May 10, 2017, 3:09pm

there is another issue here … i just checked, what is written into /var/log/journal is a binary blob, not a text file, this will make debugging a lot harder when you ask people to attach their logs to bugs …

zyga-snapd · May 10, 2017, 3:17pm

This is a well-known behaviour. journalctl has options to pick the right log file fragment.

ogra · May 10, 2017, 3:20pm

the question is if we as developers are ok with having to download the whole file locally and process it instead of being able to just click on the syslog text file in a bug to see it in the browser i personally find that rather disturbing (but can probably adapt to it if i have to)

niemeyer · May 10, 2017, 3:22pm

Instead of asking for a file, ask for a command. It can’t be simpler, really.

renat2017 · May 12, 2017, 12:19pm

Hi, guys, It’s Renat from Screenly.

It’s be nice to have such an option to dislable syslogd to avoid SD card degradation on RaspberryPI.

ogra · May 12, 2017, 12:51pm

well, the plan is now to not ship rsyslogd at all (and simply rely on journald which does not write to disk) so you dont really need to disable anything

i got the change ready but @mvo wanted to give the idea another review, i’m waiting for feedback before landing the change.

jdstrand · May 12, 2017, 12:54pm

Mostly off topic for the thread-- I know a lot of people are using the classic snap to build armhf binaries on pi2/pi3 and when doing that the logging of ALLOWED messages was ridiculous. The ‘classic-support’ interface was created to address this but the classic snap was only updated very recently to plug the interface but now things are much quieter when using the classic snap.

jdstrand · May 12, 2017, 1:00pm

FYI, snappy-debug currently relies on standard syslog and doesn’t have support for journald. Before rsyslog is dropped, snappy-debug should probably gain this support. We would also have to change any documentation that references looking at /var/log/… if moving away from rsyslog.

As @ogra mentioned, remote logging is important for certain use cases. Turning rsyslog off by default seems fine so long as we can turn it back on (indeed, snappy-debug could (ask the user to) do this in the short term if needed). I’m ambivalent about moving it to a snap, but if we do, can we make sure that it is functional before removing it from core?

ogra · May 12, 2017, 1:06pm

i’ll look into if it is possible to have an rsyslogd snap

jdstrand · May 12, 2017, 1:33pm

Another consideration is what snaps are going to break if rsyslog is gone and nothing is being logged to /var/log/syslog and friends. One way to understand this is looking at all the snaps that plugs ‘log-observe’.

Also, if rsyslog is removed then the log-observe interface should probably be redesigned. Some things to think about:

on core, the implicit log-observe interface should only allow reading the journal and the log files in /var/log that exist in core
on classic, the implicit (classic) interface should allow reading the journal and everything in /var/log
where do app slot implementations write their logs? in SNAP_COMMON? /var/log? how do we expose these logs to other snaps? (in particular, if slot writes to /var/log, how do we reconcile implicit policy from the core snap and policy for the slot implementation? All within ‘log-observe’? Make app slot implementation policy in another interface (eg, log-support)?)
how to migrate now broken snaps that need the new interface/snap/…

I’m starting to wonder if perhaps it would be better to leave rsyslog in core but disabled for series 16 and removing rsyslog for series 18.

niemeyer · May 12, 2017, 2:27pm

What’s the real difference between having a disabled rsyslog and a missing one?

ogra · May 12, 2017, 2:34pm

note that there was a similar discussion about dropping rsyslogd from the main distro as well at the mailing list which seems to come to the conclusion that rsyslog is really only needed for remote logging to another syslog instance (there is no actual action item coming out of that discussion sadly)