Writing to /etc and /var from the core snap

Introduction

This document outlines how to deal with /etc and /var for the core snap. It starts with the current approach in core(16) and the proposal for core18.

With snaps we make everything in the snap immutable. This has a lot of benefits and works great for “regular” snaps. However the “core” snap is slightly different as it needs have parts in /etc/ and /var/ writable and other parts in /etc and /var read-only. The /etc/systemd/system directory for example must be writable so that “snap install service-snap” can put the “snap.service-snap.service” file in there. However there are also files like /etc/lsb-release that don’t want to be writable. The same applies to /var.

Approach in core(16)

We use the system-image concept of “writable-path” in core16. This is conceptually inherited from the ubuntu phone and from the 15.04 implementation of snappy. It works by having a configuration file /etc/system-image/writable-paths that contains a list of ~70 files or directories that get bind-mounted from the read-write /writable/system-data/etc into the read-only /etc from the core snap. This means that that /etc is read-only and we “poke-holes” into this via the bind-mounts.

The downside of this approach is:

  • atomic write pattern fails (i.e. write("/etc/hosts.tmp); mv("/etc/hosts.tmp", “/etc/hosts”), requires ugly workarounds
  • confusing - some files in etc are read-able some are not
  • subtle bugs, e.g. if we ship /etc/foo in core and it is in “writable-paths” it will get copied to the writable /etc. However from that point on it will never get updated because the system does not know if it is pristine or user modified
  • not elegant (~70 bind mounts in the mount table)

Proposal for core18

We should ship with an empty /etc and /var in core18 and use the mechanism that systemd provides to populate them on firstboot.

This has the added benefit that we get a “factory-reset” for free via rm -rf /etc /var (Except that the firstboot snaps live in /var/lib/snapd/firstboot right now, so for a true factory reset we need to store them on a separte partition or on the boot partition. But it is an important step forward.)

Implementation

There are some classes of things that need to be in /etc, /var. For most of this we can use the tmpfile.d and sysusers.d meachanisms of systemd. Systemd will run tmpfiles.d and sysusers.d very early to ensure that important files and directories are available.

Directories (with certain permissions)

Some directories must exist for some software to run (e.g. /var/tmp). We can use the systemd tmpfiles.d mechanism to create directories with the right permissions and owners.

Files

There are some files in /etc that are critical for the system to work. On the most fundamental level these are files like: /etc/nsswitch.conf, /etc/pam.d, /etc/sudoers. Probably also /etc/ld.so.conf.d/* for multi-arch binaries to work.

Ideally all software would have built-in defaults that are used, so if there is e.g. no /etc/pam.d a sensible configuration would be used. However we are not there yet (we should push for this though, especially for low-hanging fruits).

Until we are there we should use /usr/share/factory/etc to store the pristine copies of the configuration. Combined with a matching configuration in tmpfile.d this will ensure the files are n place. Another nice benefit is that an admin or tools can run diff -uNr /usr/share/factory/etc /etc to see what changed from the defaults.

Note that once the files are in /etc they won’t get updated anymore from /usr/share/factory/etc. This means that if we need to ship a different default we need to do something manual (we could do something generic if we wanted, i.e. have a systemd unit that goes over $current-core/usr/share/factory/etc, $previous-core/usr/share/factory/etc and /etc and do an update from current-version/usr/share/factory/$f if hash(prev-version/usr/share/factory/etc/$f) == hash(/etc/$f). Which is relatively cheap and reliable.

Users/Groups

We could ship a static /usr/share/factory/etc/passwd - however we should explore the sysusers.d mechanism. This allows to setup users with uid/gid in a declarative way.

This means that (potentially) we can get rid of libnss-extrausers. We could ship all “core18” users via the sysusers.d conf files. This way adduser would operate on the regular /etc/passwd and all tools would work without having to patch them (and we still have warts here!).

Challenges

Even on the very small base-18 image we currently have ~120 files in /etc. So we need to dig and find out which of those are important which can be ignored etc.However doable and not worse than going over the 70 files in writable-paths and re-checking everything there for core18.

We should also patch some existing software and push the patches upstream so that the software has sensible defaults or looks into /lib first before doing configs in /etc. This will be beneficial to the whole ecosystem and upstream is probably open about it. The prime example is glibc which could put some files like ld.so.conf into /usr/share/misc or /usr/share/libc6 location.

Transition from core16 to core18

Just some early toughts: Just bind mounting /writable/system-data/etc to /etc after going from core16 to core18 should be fine. The tmpfiles.d mechanism will create the missing files in the now-fully writable /etc.

how exactly does that population mechanism work and how does it make sure we get the right config files from the deb packages that are maintained in the distro for us ?

regarding the rm -rf, i though the idea was to make the initially nededed files (core, gadget, kernel snaps, firstboot bits and snapd itself) undeletable via chattr -i, so a factory reset would actually be an easy “rm -rf /” … what happened to that plan ?

Last time i looked this could not manage system groups at all, you might need to extend it …

also, how will you handle the UID/GID numbering issues that caused writable-paths to exist in the first place ? as long as we use the archive as base, a deb requiring a special UID/GID for its system directories will set this at core snap build time through its postinst script. this means you end up with a fixed number inside the squashfs for that user/group meaning this set of IDs needs to be hardcoded in your passwd file (which is why we have it split in a readonly and a rw one today via extrausers) unless you add some mechanism to re-map the numbers between squashfs and the actual writable filesystem (i.e. you could dynamically chown them, on every boot, but that will make your boot dog slow)