Development sprint March 5th, 2018

niemeyer · March 7, 2018, 3:32pm

User mounts

We can now create snippets for user mount profiles from interfaces
We did not merge the change doing the creation of the user mount namespace
Initial phase will have ephemeral namespaces for these cases, which last while the application process is alive.
Second phase to follow soon after will persist the user namespaces, and use snap-update-ns to update it dynamically just like everything else today.
Current development is addressing a security concern:
- Unlike all the other mounts which snap-update-ns does, the source and the target of these mounts is owned by the user
- snap-confine runs as root, so symlinks might trick the system into mounting into the wrong places
- Unfortunately the kernel API lacks support for operations via file descriptors, and lacks support for a no-follow flag
- The simplistic idea of doing the mount and then checking is not ideal because it creates a window of exploit, and even if right now the application is frozen during the refresh and the namespace is currently hidden, in the future we want to make that namespace persistent. Also the principle of it is in the wrong side of history: do-and-check is bad security vs. don’t do anything dangerous in the first place.
- Current plan is to do the mount with a target in a non-accessible location, then verifying, and just then exposing to the world.
- One way to verify might be to open the directory, inspect the path associated with the file descriptor to ensure correctness, doing the bind mount, and then comparing that the inode is the same, and finally exposing the target location to the application
- The way to expose securely at the final step is to go to the target directory and mount from within after ensuring correctness
Further investigation will take place next week to ensure correctness and safety.
Landing of working final ultra safe version (maybe broken) is after security review takes place, so probably two weeks from now.

Portals

Portals work by allowing confined applications to send messages to the portal
Portals check the identity of the call site and make a decision
Support for snaps was merged upstream but it was malfunctioning
Support for snaps was fixed upstream already, but not released yet
Ubuntu might be able to support the fixed version soon, and Fedora likely has a fast paced schedule on that too; need to coordinate with Debian and others?
How to check if there’s a working version?
Even then, fixed version will take a while to propagate to every place, and will never be seen in some old releases, which makes snaps using that feature incompatible with those. That’s a side effect of API evolution, but we need to understand how best present the situation.
There’s a security concern that needs further discussion: if portals think the snap is actually unconfined, they might fail open. Needs further investigation and impact analysis.

Conan_Kudo · March 7, 2018, 3:44pm

If I’m making something that mimics Ubuntu Core, I’d need to make a custom central snap. Right now, that’s core.

Also, Fedora’s snapd is already different from yours, so if I relied on the one Ubuntu builds, it’d fail.

niemeyer · March 7, 2018, 7:40pm

Yeah, bases already allow you to have a custom core for Fedora, and after we’re done with the things we want to get into 18.04, we’ll come back to finish the multi-base support in Ubuntu Core. The change that will enable us to have Ubuntu Core 18 is the same that will enable you to have a Fedora Core.

Conan_Kudo · March 8, 2018, 12:04am

As soon as snaps can use portals, let me know which ones and what releases need to be available so that I can work with the Fedora maintainers of those packages to get those integrations in place across as many supported releases as possible.

Currently, I can guarantee it being available in Fedora 28 (which will release this May) if Alex Larsson’s PR is merged soon and a release is made, but bringing it back to Fedora 26 and Fedora 27 will require some cooperation, and I’ll probably only entertain those discussions once snapd actually does support portals.

niemeyer · March 8, 2018, 7:38am

Thanks, Neal! Let’s catch up soon once we have something working.

niemeyer · March 8, 2018, 8:51am

Remaining work on core18

Development core18 is functional
Rename from base18 to core18 still pending
Need to coordinate hand over of assemblage to foundations
Decide how the test collaboration works
Package set is small today, and we should start like that. Easy to add more.
Store should flag for manual review any bases that have ABI changes (content disappearing, libraries with ABI changes, etc)
In the future we may need to open more permissions on external bases if they don’t match the current expectations of confinement (access to standard paths, etc). Eventually we might support a template inside the base for that purpose.

Ubuntu Core 18

Must support configuring the bootable base, likely in the model
Run the build+write-over snapd integration test with the extra bases
We would still like to get rid of writable content in /etc and /var from the base, so that the system itself can have it all. This allows getting rid of the extra-users concept. Such changes often require some upstream work.
For core18, we’ll start that work by removing the synced writable-paths directories, and some cheap persistent ones.

niemeyer · March 8, 2018, 9:07am

Service survival across refreshes

It’s up in edge for testing
If all goes well, will be in 2.32
Needs documentation
Waiting for feedback from @sherman confirming solution

niemeyer · March 8, 2018, 9:36am

Software watchdog support

Feature enables snaps to leverage systemd’s software watchdog
Socket of systemd is at /run/systemd/notify
The sender of the message must necessarily be in the cgroup that was assigned by systemd when it started process.
That said, there were known CVEs that enabled a process to crash systemd (and thus the system) when pinging with a zero value. So we’ll lock down access behind an interface.
Interface name will be service-watchdog
Interface will force manual review initially, and after a security review of the feature code, we’ll open it up for auto-connection by default.
Right now the service configuration locks down so that just the main PID can ping the watchdog.
In the future as we learn more about use cases we may open this up so other processes inside the snap can ping as well. Once we do that we should also implement a “snapctl ping-watchdog” command, and symlink as “systemd-notify”.
Both the main feature and the interface are up for review already.

niemeyer · March 8, 2018, 10:43am

API compatibility in smart local proxies

We live in a world straightforward in terms of API versions, because snapd can always expect the new API to be available by the time snapd gets released.
Store can do anything necessary as long as API versions in use are backwards compatible
We would like to support some sort of smart local proxy that can respond to queries while offline.
So snapd would need to talk to APIs which are not up-to-date with respect to recent development.
We need some sort of version number in the API, and snapd may take that into account when performing certain actions.
If snapd detects an old API, it may either fallback to a degraded behavior or may refuse to perform the requested operation altogether, depending on the feature at stake.
A nice error message must be shown for non-working features, and a warning might be shown on degraded behavior depending on the case.
Version will be represented as either a date, or a minor (2.X).
Version is unique for all services behind the endpoint.
Responses need to include the version even if the request failed, even on 404s, so that snapd can tell that the failure is due to an old API.
Differences across versions need to be documented (“Available since X”, “Removed on Y”, “Changed in Z so that …”, etc).
Eventually snapd should support some kind of kill switch that tells clients that are several years old that they need to refresh before anything other than refreshing snapd will work sanely.
Store team will investigate the deployment of endpoints with old versions, for testing purposes.

niemeyer · March 8, 2018, 11:34am

Snapshots

More reviews needed, all current reviews addressed.
Snapshot “groups” needs to be changed to “sets”, and “ID” column in listing to “Set” as well.
A snapshot may (and by default does) snapshot multiple snaps at once. These are organized in a set with a common ID.
Format is zip
“Human” timestamps to be adopted once PR lands
Lookup of users sometimes crashes Trusty. More investigation needed.
With current PR, on restore the data for a particular revision goes to that same revision again.
We need to track the epoch, for the following scenario:
1. Snapshot revision 1, epoch 5
2. Revision 2 is installed, with epoch 6*
3. Revision 3 is installed, with epoch 6
4. Now a restore of snapshot from (1) happens. Then what?
In this case, instead of overwriting current, we keep the data in the original revision and warn.
The format is a zip file containing:
- Metadata in a json file
- One tarball archive per data directory snapshotted ({/home/<user>,/var}/snap/<app>/{current,common})
- Checksum file for the metadata file (everything else has the checksum inside the metadata itself)
Choice of zip because it’s indexed and allows streaming, and tarball because it’s precise and flexible (file attributes, multiple compressions, etc).
The metadata file includes key details (name, version, summary (TODO), revision, epoch (TODO), …), configuration of snaps, checksums of each of the archives.
Multiple snaps may be snapshotted and restored at once. These form a group. Each snap still have their own independent snapshot archive, even when part of the same group.
Snapshots may also be requested for all snaps in the system, or for individual snaps, for all users, or for individual users. Default is all/all.
If snapd cannot open the snapshot to open its metadata, it disappears from the list, which needs to change. We should report those as “broken” instead.
If snapd can open the archive, but the record checksums do not match, it will just refuse to restore.
It allows checking for validity without restoring.
To support the snap remove case, we need to support automatic expiration of snapshots. We’ll have that in the future.
Commands:
- snap forget <set id> [<snap> ...]
- snap save [--users=<user>,...] [<snap> ...]
- snap saved [<snap> ...]
- snap check-saved <set id> [<snap> ...]
- snap restore <set id> [<snap> ...]

niemeyer · March 8, 2018, 3:34pm

Convention for commands vs. flags

Currently in use or in consideration:
- snap refresh --time – Outputs the timestamps for refresh.
- snap refresh --list – Outputs the list of refreshes.
- snap refresh --hold=2h – Hold back refreshes for 2h.
- snap refresh --schedule=<timer>
Issue: none of those actually refresh.
Is that okay as a pattern, or should we get out of it?
Some alternatives:
- snap schedule-refresh [--hold=2h] [<timer>]
- But then, how to list it? schedule-refresh? But that’s a request?
Point is still open. More discussion is needed.

niemeyer · March 8, 2018, 4:37pm

Hotplug

Initial phase of the feature will target USB devices, as it’s obviously extremely popular and provides the right metadata.
Today for a core device to support custom devices in a nice way via interfaces the only solution is to have the slots on the gadget snap, which is not ideal.
For it to work snapd has to understand the idea of USB devices showing up at any point. Some of that is already done for other reasons today (auto-import).
Also needs a way to pattern match the data that the bus offers snapd, to define which specific interface, and with which attributes that slot should be made available into the system.
Once devices go away, we disconnect (run hooks, etc), remove the slot from the system, but keep the data about the state it was, including which connections were made, and which auto-connections were disconnected.
Once the device shows up again, we recreate the slot, reestablish the state as closely as possible, including reconnections, hooks, etc.
Slots might be named after the we receive via udev, but we need to investigate if we can create good heuristics for that. It’s better to have a sequential number on top of the interface name (serial-port10) than to have a very dirty name.
We might use the interface label as more verbose data coming from the device to describe it, and then have a way to list it.

niemeyer · March 9, 2018, 9:06am

Ubuntu Core 18 upgrade story

Today we have a core snap, but Ubuntu Core 18 won’t. It will be core18 plus an independent snapd snap.
With snapd being a separate snap, no reboots when it updates anymore.
But it’ll have the same safe boot mechanism that exists today for snapd still, with automatic reverts if boot fails for whatever reason.
We may be able to do that without touching initrd, since snapd starts late in the boot cycle and is not critical for the system to boot.
No devices should update without approval, as this is a sensitive topic and requires testing to ensure third-party changes work.
We’ll probably do core16 after core18 is out, doing the same split out of core16+snapd, replacing the original core.
Once that’s done, we’ll evaluate how trivial the change ended up being, and what the risks are, and take a stance on whether to update transparently or not.
Model needs to grow a field to define the base used as the root, but we may need some defaults to allow upgrades. That field might also be used to define the upgrade.

Conan_Kudo · March 9, 2018, 11:24am

I’d personally be happier if when you did the split, the name of the snaps changed to be prefixed with ubuntu-, just as all the newer Ubuntu-specific snaps have (e.g. snaps for Ubuntu Budgie, Ubuntu MATE, etc.).

It’d reduce the confusion for when I get to the Fedora based stuff, and others want to do their own custom things…

laney · March 21, 2018, 9:22am

Can you please clarify these two points? What is the maximum number of days measured from? If I download an ISO (classic) image with some preseeded snaps, and that image is more than 60 days old, what happens?

niemeyer · March 21, 2018, 12:19pm

The general behavior is that time is accounted from first boot time, so the logic will hold updates for a couple of hours to account for quick use cases without attempted updates. The discussion on this is happening here.

For the particular case of live images we’d like to hold longer since much of the system may be in RAM. There are more details about that case in this topic.