Snapd 2.25 blocked because of revert race condition

Our CE team found a possible issue when revert from a version of snapd 2.25 (or later) to a version older than 2.25. The issue is that 2.25 adds new items to the seccomp profile. On revert those new items are not understood by the older snapd/snap-confine.

We have code in snapd to rewrite those profiles on startup, so the older snapd will write correct security profiles when it starts. However the snap services on a device can start before snapd had a chance to rewrite the security profile(s) so the services starts with profiles that the older snapd does not understand and hence fails.

This is what is observed by the CE team that uses a network-manager snap. If core is reverted and the device rebooted, network-manager starts in parallel with snapd, the profile is not yet rewritten and network-manager fails to start, leaving eth0 unmanaged.

1 Like

This highlights issues we’ve been just discussing here:

Do we have a plan for how to fix this appropriately yet?

It turns out this is a bit of a thorny problem that we should approach in multiple stages.

In the short term we need to unblock 2.25+:

OR

OR

Medium term

Ensure all snap based service run after snapd re-generated the security profiles to avoid the race. We will need to modify snapd to become a notify type systemd service and emit the “READY=1” only after the security profiles are fully ready. We will also need to write updated systemd wrappers for the snap services that contain “After=snapd.service”.

Medium/longer term

Move towards generating bpf profiles directly for snap-confine (Generate bpf files in the seccomp backend)

Long term

Rethink how we store the security profiles. There are multiple dimensions that can make a profile incompatible, one is the internal format of snapd, the other are the capabilities of the running kernel etc.

1 Like

If snapd needs to start reliably before any service or commands start why don’t we make snap run wake up snapd and give it a green light to execute? With this we would fix this bug and also gain ability to:

  • mount snaps lazily to conserve resources
  • refresh snaps on demand …
  • or inform users of potential CVEs
  • (other ideas possible)

One interesting question is if After=snapd.service actually makes snapd start up if not running. From what I recall it does not. On Fedora (and I bet other places are similar) we do not start snapd itself on boot, just the .socket.

I was surprised to hear that we were no longer enforcing that snapd must be started before daemons. The sandbox was specifically designed with this requirement such that apparmor was loaded before snapd, snapd updated security profiles (seccomp or apparmor (reloading as necessary)) and only then were services allowed to start. It was understood that non-daemon commands necessarily would only be run from a login session (eg, lightdm, console, etc) and that those should start after snapd (I say ‘should’ because I didn’t think we had anything in practice that said login session must be after snapd and we instead relied on the fact that this happened under normal boot conditions (a bug in and of itself), except on Touch where we specifically adjusted the lightdm service to start after apparmor).

Anyway, I think the BPF approach is quite sane in general, but it doesn’t really solve the whole problem. Historically we relied on apparmor to have to start before any snap commands because the policy needed to be loaded into the kernel before the command started. AppArmor conveniently has profile ‘reload’ functionality where you can change the profile of a running process that was started under the same profile (ie, apparmor_parser -r) and so snapd coming along at some later point in time to reload the apparmor didn’t show up as a problem (though, that is not correct). Importantly, seccomp does not have reload functionality (due to upstream seccomp design) and the profile must be in place before the command starts. The BPF approach alone does make sure that the profile is in place in such a way that snap-confine won’t die, but it does not guarantee that the policy is correct for the version of snapd that is actually running. To me, it is vitally important that the security policy that commands run under is predictable, robust and deterministic so that there is never a question of “is this now running under the old, pre-revert policy or the new reverted policy?”. The BPF approach alone does not solve that-- it only solves the problem of “can I launch the command under policy that snapd compiled at some point in the past?”. This is a worthy problem to solve in and of itself of course, but commands should only be allowed to run under the policy that the version of the running snapd generated.

As such, I strongly feel something along the lines of the medium term approach is required, regardless of if we pursue the BPF approach or not.

In terms of the long term approach, I think we should look at apparmor again-- the parser is able to parse all old policy and introspect the kernel for what is supportable. snapd/the seccomp parser BPF compilation tool would do exactly the same. It is in a position to introspect the kernel and can recompile policy to disk as necessary. Determining when that is necessary and the implementation details is TBD, but the seccomp parser is certainly in a position to look at existing policy, update the binary BPF, look at the kernel, etc and make intelligent decisions.

In order to unblock 2.25/2.26 so that they can move to stable I added the following alternative PR: https://github.com/snapcore/snapd/pull/3407

It will write the new and incompatible seccomp profiles to a different place than the previous snapd (/var/lib/snapd/seccomp/profiles-v2 instead of /var/lib/snapd/seccomp). This is slightly ugly, however it solves the problem that we cannot (currently) guarantee that security profiles are generated in time (there will be work to fix that, but even when it is fixed it will not be fixed in old snapd versions, so a revert will still mean old snapd hits the same issue).

In addition to the above PR I would like to add syntax to snap-confine so that it either skips lines it does not understand or metadata in the file like synatx-ver: 2 and if snap-confine finds such a line it will stop parsing if the version there is higher than what it can parse. With this landing we can be sure we never need to change the directory again.

The next step will be a) move towards using bpf instead of text-profiles b) ensure start snap service are only starting after snapd did rewrite the profiles.

I don’t care for this for the reasons I mentioned in the PR. I think for the short term fix we should have a point release for 25 that removes the mknod rules (I don’t care for this either, as noted elsewhere, but it is better than this imho) for a quick fix to unblock 25 and then focus on something more proper for 25 -> 26 or (later, ie whenever the mknod is added back).

@mvo This seems to go in the direction we discussed already, in the sense that introducing a new place in seccomp/profiles-v2 is not so different than introducing a new place in profiles/<digest>/seccomp. So if possible can we spend at least half an hour talking about how it might look like if we decided to already put some work towards that design that will solve the longer term problem as well? Otherwise we’ll soon find ourselves doing the same move again.

For the record, that message is reading a bit confusingly. I know from it that you don’t care about a few this and that there’s another this which is worse than another former one of those. :slight_smile:

It’s the directory I don’t particularly care for. I’m fine with something in the profile to state the syntax version and possibly to make snap-confine skip stuff it doesn’t understand.

I actually don’t see much value in stating the profile format version explicitly in this particular case. I mean, we should do it regardless because it’s a good practice and very useful metadata when debugging things, but this isn’t going to solve the problem at hand. We know for sure the profile is incompatible by just trying to read it. Knowing it via a different field wouldn’t fix the problem either.

Again, this is just for clarity, I know we all understand this and are looking at different alternatives to the long term design.

From the PR:

‘In 15.04 and Touch we would have per version profiles for applications and that was deemed too confusing. Specifically, “for series 16 there should be only ever be one profile on disk and it should be for the current application”. Adding different directories to fix this bug goes against that design constraint and we won’t know which profile is loaded at the time the application started.’

As mentioned elsewhere, this is not a bug in snapd 25, it is a bug exposed in 25 and the bug is the combination that snap-confine is now re-exec’d with the wrong policy and that snapd dropped the systemd bits long ago to make snaps start after snapd. Adding versioned directories (and only bpf cache or only snap-confine not die()ing) only papers over the real problem that the wrong policy is on disk and being used.

What we really need is to honor the implications of the initial design requirement and work towards whatever is on disk is correct for this snapd/snap-confine version. Otherwise we have latent bugs where the system is not deterministic and old policy is being used when it shouldn’t. Today, the issue is that the syntax is different, but this could just as easily affect syntactically parseable policy that is wrong for either apparmor or seccomp (eg, the network-manager interface needs a certain combination of rules that make it fail to run when reverting but the policy isn’t regenerated yet). The syntax directories approach doesn’t address that at all.

If we can address the non-determinism of security policy generation, we don’t need syntax-directories. I suggest backing out the mknod rules for now and then work on determinism next, then add the mknod rules back. Example timeline (all steps and things to do still TBD):

  • 25 dot release has mknod rules reverted, that is pushed everywhere 25 is now. 25 is promoted to stable such that 24 -> 25 -> 24 has no issues
  • 26 release introduces determinism (eg, add back the ‘after snapd’ systemd bits at a minimum). This allows 25 -> 26 -> 25 to have no issues. 26 might also introduce changes to snap-confine to make it not die() and/or use bpf cache but the determinism is the key requirement
  • 27 (or later) release adds back the mknod rules so 26 -> 27 -> 26 has no issues

The above doesn’t handle the lesser used 25 -> 27 -> 25 scenario but as mentioned, this is nothing new. If people are worried about this scenario, we could be conservative and not reintroduce mknod until snapd 28 or later (ie, 2 or more releases of snapd after policy determinism is introduced).

For completeness (and to go completely the other direction from what I last suggested), rather than introduce syntax directories, use core revision directories. Eg:

  • /var/lib/snapd/apparmor/profiles (everything before determinism), /var/lib/snapd/apparmor/profiles.1689, /var/lib/snapd/apparmor/profiles.1577
  • /var/lib/seccomp/profiles (everything before determinism), /var/lib/snapd/seccomp/profiles.1689, /var/lib/snapd/seccomp/profiles.1577

where ‘1689’ and ‘1577’ is the revision from the store (therefore also works for unasserted installs with ‘xN’). A symlink could be used to help admins/developers/etc. If adding a symlink, would have to be careful to not reintroduce this issue by making snap-confine just look at the symlink :slight_smile:

If people go this direction, we’d need to consider how to handle the bpf cache and /var/cache/apparmor.

snap-confine then just uses the directory for the revision it is. Upon revert, snapd needs to reload the apparmor policy for the revision to preserve determinism (less of an issue today because it supports profile replace, but still).

I explored the profiles digest idea a little bit more this morning.

Some thoughts what we could do:

  • we create the profile digest from inputs like “seccomp: v2, apparmor: v1” and hash that, when we add more inputs or increase a version this naturally will result in new digests
  • we make snapd write profiles to dirs like /var/lib/snapd/seccomp/profile.<digest>/
  • we make snap-confine read the profiles from the above dir. snap-confine needs to know what <digest> supports:
  • we could duplicate that code (one time in C, one time in go)
  • we could make snapd write the profile-digest at startup to /run/snapd.profile-digest (we cannot use /var/lib/snapd/snapd.profile-digest or we run into the same race-condition at boot)
  • as long as the digest is not using any dynamic component (i.e. not uses the running kernel or similar) we could generate the digest at build time
  • the schema needs to be generalized for at least apparomor too

How about we take this a step further /var/lib/snapd/profiles/{digest}/{tag}.{kind}, e.g. /var/lib/snapd/profiles/{digest}/snap.hello-world.env.seccomp. We could use this to unify and clean up where the profiles are stored.

Apparmor has two options, first of all we can just load the profile we wish (the files on disk don’t matter) and second of all we could even teach snap-confine to load a specific versioned profile so that we can have multiple profiles “in flight” and freely choose between them without any race conditions.

Maybe? The apparmor_parser already does a reasonable job with this by knowing what it supports, what the kernel supports and what is in the policy. The only time this would be needed for apparmor would be if newer apparmor supported syntax that snapd used but the system that apparmor was on didn’t support the new syntax. This seems more like a dependency declaration on a newer apparmor though…

That’s pretty much the issue we are discussing above still. The general statement of the problem is that software changes and the format of data they use also changes in incompatible ways. That incompatibility may sometimes be observed going forward, when version N+1 doesn’t understand the data of version N, but it’s more frequently observed when going backwards, when version N doesn’t understand the data of version N+1, because it’s easy to do a migration forwards, but harder to it backwards or introduce completely new ideas while preserving a limited understanding for the old.

So, even if apparmor_parser knows what it supports today, at some point the apparmor profile will evolve, and the old tooling won’t support the new version of the profile anymore. If the snap holding the tooling is reverted into an old version, we need to be able to offer it a format that it can understand. That same problem may happen and will eventually happen with every external dependency that we work with (apparmor, seccomp, udev, etc).

To unblock the release we will revert the syntax/symbol changes in the seccomp code for now and do a new 2.26.4 release to unblock stable and to unblock 2.27.

1 Like

All reverts are in place now and a new 2.26.4 release is pushed to the beta channel of core. Once this gets QAed we can move it further to candidate and eventually stable. This unblocks the current issue. We also have a regression test for this in place now and plans for a better fix. But its more involving so we need to push those reverts out now.

1 Like

It turns out that 2.26.4 still contains an issue on revert from 2.24.1. This time with the bluez daemon, the root cause is https://github.com/snapcore/snapd/commit/173896270b2d57a51c8d046af73fa741d81d82da - there is a growing concern that reverting more of the interfaces seccomp code may introduce errors down the line. I think we should consider the seccomp-bpf branch (https://github.com/snapcore/snapd/pull/3431) instead of trying to revert bit by bit. This branch solves this particular problem by using different profile names in the new version. So a revert with that branch will cause the system to use the 2.24.1 generated profiles to be used.