Change in logging behaviour on Ubuntu Core


#1

as a result of the discussion in this thread i have now removed rsyslog from the core snap for ubuntu-core images in the edge channel (available with the next build of the core snap that is currently running).

this will avoid duplication of services, extensive writes to solid state disks (less wear-out) and in general the waste of disk space for log files.

we are left with journald on our images which logs to a ringbuffer by default, some notes and hints:

  • if you want persistent logs across reboots, create the /var/log/journal directory, journald will start writing its log blobs to this dir then (a config option will later be added to the core snap to toggle this behaviour on/off)
  • remote logging is currently not available out of the box but there is still an open item wether we want to ship systemd-journal-remote in the image (and have configure options for it)
  • an rsyslogd snap will be created for users that perfer rsyslog

if you want to test the new logging behaviour you can switch your test device to the edge channel via:

snap refresh core --edge

to go back to stable:

snap refresh core --stable

this change will come to stable earliest in 4 weeks. in case you run into any issues with your snaps or device setup, please speak up in this timeframe.


Bug running mir-kiosk-apps
Out-of-disk-space protection
#2

There are additional tasks surrounding documentation updates, especially regarding persistent and debugging.

I maintain that removing rsyslog from the image before the snap is available will break certain environments where they have setup remote logging or need/want to use snappy-debug. Is there a timeframe for when the rsyslog snap will be available? Is there a commitment to making it available before core-without-rsyslog is promoted to stable?


#3

Repeating the comment made in the original thread, the fact we won’t be shipping logs into the same location won’t break any software. They will stop observing new logs on disk, which is the obvious and intended consequence of disabling or removing such a service. We should of course publicly inform people that this is being done and why (two threads on that already, and should be in the release notes), but I see that change as a very positive one considering it will prevent wearing solid state devices out, and will also prevent unattended systems from crashing due to disks full of logs. That Ubuntu thread that reached an agreement on doing this exact change also seems to back this as a good choice. The original thread also notes that there are almost no users of this interface, which is an excellent data point.

So, my vote for us to move on and remove rsyslog from core altogether, and then start working on a syslog snap, preferably together with someone that in fact depends on this feature instead of someone that would just develop it on a theoretical basis.


#4

@niemeyer - perhaps I wasn’t clear, but you misunderstood (part of and the spirit of) what I was asking. I wasn’t trying to rehash the other thread…

First, I wanted to point out that there is documentation work for anything referencing /var/log/syslog that needs to be captured since it wasn’t part of the list in this topic.

In terms of breaking snaps, the fact that there is no logging doesn’t break snappy-debug in terms of crashing, but it means that in its current form it is immediately unusable. This isn’t hard to fix in that snap and snappy-debug breaking shouldn’t block the removal of rsyslog, but knowing when the new snap will be available will help prioritize the work to fix snappy-debug.

What I was most concerned about was not snaps breaking, but environments breaking for deployed devices where the admin has configured remote logging (something important for certain classes of monitoring). The new core snap will immediately stop remotely logging, and sites that rely upon this will be disrupted. Series 16 is stable and removing rsyslog, whatever positive attributes that brings, means that functionality is being removed that people may be depending on, and we don’t have any metrics that I am aware of on deployments that rely on remote logging. The decision was taken to remove rsyslog so IMO having the syslog snap available at the time the core snap is promoted at least allows sites to transition to the snap instead of having to wait an undetermined amount of time before a remote logging option is available again (note, as mentioned in the other topic, journald’s remote logging functionality is not a replacement for rsyslog or the upcoming syslog snap because it doesn’t talk traditional syslog on port 514/udp. In other words, using core config for journald remote logging won’t necessarily be enough for these sites).


#5

This seems a fairly large change for an LTS release. Was there any discussion about making this configurable, and default to the current behavior? When systems in the field get this update, it sounds like syslog just stops working, correct? And then customers would need to install the to-be-created syslog snap manually if they want the old behavior back?


#6

Yes, the intended outcome of the proposed change is that rsyslog gets removed.

See the note above about there being almost no snaps that use that interface, though, and there’s also no way to configure it using the core snap. Any such changes must have been manually made.


#7

OK.

I still feel like this a major change for an LTS, although the reasoning behind the change does make sense (although the whole binary log format makes me cringe a bit). Let me check with some of my customer contacts to get a read on this. There are are quite a few commercial devices already in the field running UC16, and I want to ensure we don’t cause problems for them.


#8

Much appreciated, thanks Tony!


#9

I know most of this was discussed already so sorry for repeating (some of) it.

First, I’m in favour of removing rsyslog, I was never happy that we added it in the first place.

Having said that, doing the removal in 16 (and not in 18) seems slightly risky. It is hard to know if anyone of our existing users if relying on rsyslog. We have /etc/rsyslog.d/ writeable, so a user can ssh into the box and customize the syslog configuration. Our automatic refresh will change how the system used to work which may be perceived as something negative that comes from our automatic refreshes. So far we had an easy time defending the automatic refreshes as something positive that brings better security, bugfixes and features. This change might set a precedence that people who argue against automatic refresh can point to.

Of course the above worries rely on the assumption that there are users who care about a manually configured ®syslog. Maybe it is just a case of your broke my workflow.


#10

Unfortunately your assumption can’t be refuted or affirmed based on the forum discussions since there are currently no metrics for sites changing rsyslog configuration for any reason (including but not limited to remote logging). rsyslog was included because at the time traditional remote logging was considered important. The directory was then made writable in preparation for making the core snap configure remote logging. The work for the core snap didn’t happen, but that is because it wasn’t prioritized, not because no one was using it or wanted to use it.


#11

FYI, I started on making snappy-debug work with journald and can say that while journald has a nice API for accessing the logs, it seems that the log file format changes such that older python3-systemd can’t read newer journald log files (ie, a series 16 snap built from Ubuntu xenial archive works on trusty and xenial, but not zesty). I’m debugging this (I may just tail journalctl in the interest of time) but this demonstrates some of the issues with journald that are not present with traditional syslog that developers will face.


#12

We won’t do such changes in a careless way, but I also don’t think we should be dogmatic about doing changes inside a release cycle which are improvements we have good agreement on and know won’t break systems in relevant ways. The worst thing that will happen in this case is people won’t see new logs, and will complain. We can easily rollback this behavior modification if we find evidence that the problems introduced are more relevant than the advantages it provides.


#13

Unfortunately this doesn’t work either because the journalctl in the series 16 core snap is accessing the journald files in the newer classic distro (in this case, 17.04) and it doesn’t understand them either. :\ Also, per https://www.freedesktop.org/wiki/Software/systemd/syslog/ you aren’t allowed to just connect to /run/systemd/journal/syslog and need integration with systemd that doesn’t currently exist in snapd.

This requires more investigation but I can say that making a snap that works on Ubuntu Core or a classic distro that uses a compatible systemd journal is not terribly difficult. Making that same snap work with a newer journald from a classic distro that is newer than the series of the core snap is problematic since the binary file format of the journal may be different (eg, from series 16 to Ubuntu 17.04).


#14

Isn’t it just a matter of using the journalctl from the system? Also note the tool itself has provisions for tailing and filtering based on multiple factors.


#15

One thing that came to mind is that on classic distro, rsyslog is likely already running. As such, an application could decide to use the journal or not based on if on classic distro. If not on classic distro, there is no discrepancy between the series and the log format, because the series should be the same on both. This probably won’t hold true with alternate core snaps (eg, Fedora), but there the snap should target the runtime and therefore understand the log format of that alternate core.


#16

/bin/journalctl comes from the core snap on classic distro and it doesn’t understand the log format of the files in /run/log/journal/*. The security policy doesn’t allow executing /var/lib/snapd/hostfs/bin/journalctl, but if it did, then we have:

# /var/lib/snapd/hostfs/bin/journalctl --follow
/var/lib/snapd/hostfs/bin/journalctl: error while loading shared libraries: libsystemd-shared-232.so: cannot open shared object file: No such file or directory

so we’d then have to set LD_LIBRARY_PATH. This is possible. Eg:

With these rules:

/var/lib/snapd/hostfs/bin/journalctl ixr,
/var/lib/snapd/hostfs/lib/systemd/*.so* mr,

Can then run with:

$ LD_LIBRARY_PATH=/var/lib/snapd/hostfs/lib/systemd:$LD_LIBRARY_PATH /var/lib/snapd/hostfs/bin/journalctl --follow
...
May 18 10:12:41 iolanthe kernel: audit: type=1400 audit(1495120361.472:70824): apparmor="DENIED" operation="open" profile="snap.strict.sh" name="/etc/fstab" pid=8276 comm="cat" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
...

I’ll prepare a PR for that because I think it is worthwhile regardless of anything related to rsyslog.

Note that with the above I can make snappy-debug itself work, this is a hurdle for people wanting to use log-observe since they have to know to use /var/lib/snapd/hostfs/bin/journalctl and libraries like python3-systemd still won’t work.


#17

Here’s the initial customer feedback I received:

I always prefer to have a text log in the system due to:

  • Portability: Binary format is really a problem when you need to examine the log from a system which doesn’t have the systemd journald running. For example, sending the logs to a server and a developer trying to quickly examine the logs on a Windows system. Or, for troubleshooting , you need to boot off of a USB stick with a minimal linux environment ( without systemd ) in order to examine the logs on the drive.

  • Programmatic analysis of system logs to look for certain types of error or failures. With text logs, this can be done in an OS independent way.

I believe (just because of its foundational nature), at least, system logs should be in a format that can obtained and analyzed with minimum resistance.

Has anyone given thought about whether this could be accomplished in the short term via a build-time option vs. a core snap config option? I think forcing this change on customers via an update to an LTS release sets a bad precedent.

Also regarding the original arguments for making this change, while I agree that wear leveling is a valid use case, in theory log rotate should prevent system crashes due to out-of-control log files using all the disk space.


#18

#19

what would that be ? the initial change i was doing actually added a snap set/get option to dis/enable rsyslog completely either trhough the gadget or via the above set/get, how would a “build time option” look like here ?

yes, size is definitely not an issue we have (and keep) logrotate around to pervent any files from running over …
we do have a customer with tens of thousands of boards where a significant percentage needs fresh SD cards regulary because logging wore them out. this real life issue caused the whole change we are discussing now.


#20

FYI, I put forth the idea of shipping rsyslog disabled in the other thread. In this manner, SSD wear is addressed, sites can opt into text only logs and people could opt into remote logging (both via core config). A gadget could then update core config. The response (please correct me if I mispoke) was that all of this could be achieved with an rsyslog snap-- you preinstall rsyslog and configure the snap via the gadget.