Snapd 2.25 blocked because of revert race condition

In order to unblock 2.25/2.26 so that they can move to stable I added the following alternative PR: https://github.com/snapcore/snapd/pull/3407

It will write the new and incompatible seccomp profiles to a different place than the previous snapd (/var/lib/snapd/seccomp/profiles-v2 instead of /var/lib/snapd/seccomp). This is slightly ugly, however it solves the problem that we cannot (currently) guarantee that security profiles are generated in time (there will be work to fix that, but even when it is fixed it will not be fixed in old snapd versions, so a revert will still mean old snapd hits the same issue).

In addition to the above PR I would like to add syntax to snap-confine so that it either skips lines it does not understand or metadata in the file like synatx-ver: 2 and if snap-confine finds such a line it will stop parsing if the version there is higher than what it can parse. With this landing we can be sure we never need to change the directory again.

The next step will be a) move towards using bpf instead of text-profiles b) ensure start snap service are only starting after snapd did rewrite the profiles.

I don’t care for this for the reasons I mentioned in the PR. I think for the short term fix we should have a point release for 25 that removes the mknod rules (I don’t care for this either, as noted elsewhere, but it is better than this imho) for a quick fix to unblock 25 and then focus on something more proper for 25 -> 26 or (later, ie whenever the mknod is added back).

@mvo This seems to go in the direction we discussed already, in the sense that introducing a new place in seccomp/profiles-v2 is not so different than introducing a new place in profiles/<digest>/seccomp. So if possible can we spend at least half an hour talking about how it might look like if we decided to already put some work towards that design that will solve the longer term problem as well? Otherwise we’ll soon find ourselves doing the same move again.

For the record, that message is reading a bit confusingly. I know from it that you don’t care about a few this and that there’s another this which is worse than another former one of those. :slight_smile:

It’s the directory I don’t particularly care for. I’m fine with something in the profile to state the syntax version and possibly to make snap-confine skip stuff it doesn’t understand.

I actually don’t see much value in stating the profile format version explicitly in this particular case. I mean, we should do it regardless because it’s a good practice and very useful metadata when debugging things, but this isn’t going to solve the problem at hand. We know for sure the profile is incompatible by just trying to read it. Knowing it via a different field wouldn’t fix the problem either.

Again, this is just for clarity, I know we all understand this and are looking at different alternatives to the long term design.

From the PR:

‘In 15.04 and Touch we would have per version profiles for applications and that was deemed too confusing. Specifically, “for series 16 there should be only ever be one profile on disk and it should be for the current application”. Adding different directories to fix this bug goes against that design constraint and we won’t know which profile is loaded at the time the application started.’

As mentioned elsewhere, this is not a bug in snapd 25, it is a bug exposed in 25 and the bug is the combination that snap-confine is now re-exec’d with the wrong policy and that snapd dropped the systemd bits long ago to make snaps start after snapd. Adding versioned directories (and only bpf cache or only snap-confine not die()ing) only papers over the real problem that the wrong policy is on disk and being used.

What we really need is to honor the implications of the initial design requirement and work towards whatever is on disk is correct for this snapd/snap-confine version. Otherwise we have latent bugs where the system is not deterministic and old policy is being used when it shouldn’t. Today, the issue is that the syntax is different, but this could just as easily affect syntactically parseable policy that is wrong for either apparmor or seccomp (eg, the network-manager interface needs a certain combination of rules that make it fail to run when reverting but the policy isn’t regenerated yet). The syntax directories approach doesn’t address that at all.

If we can address the non-determinism of security policy generation, we don’t need syntax-directories. I suggest backing out the mknod rules for now and then work on determinism next, then add the mknod rules back. Example timeline (all steps and things to do still TBD):

  • 25 dot release has mknod rules reverted, that is pushed everywhere 25 is now. 25 is promoted to stable such that 24 -> 25 -> 24 has no issues
  • 26 release introduces determinism (eg, add back the ‘after snapd’ systemd bits at a minimum). This allows 25 -> 26 -> 25 to have no issues. 26 might also introduce changes to snap-confine to make it not die() and/or use bpf cache but the determinism is the key requirement
  • 27 (or later) release adds back the mknod rules so 26 -> 27 -> 26 has no issues

The above doesn’t handle the lesser used 25 -> 27 -> 25 scenario but as mentioned, this is nothing new. If people are worried about this scenario, we could be conservative and not reintroduce mknod until snapd 28 or later (ie, 2 or more releases of snapd after policy determinism is introduced).

For completeness (and to go completely the other direction from what I last suggested), rather than introduce syntax directories, use core revision directories. Eg:

  • /var/lib/snapd/apparmor/profiles (everything before determinism), /var/lib/snapd/apparmor/profiles.1689, /var/lib/snapd/apparmor/profiles.1577
  • /var/lib/seccomp/profiles (everything before determinism), /var/lib/snapd/seccomp/profiles.1689, /var/lib/snapd/seccomp/profiles.1577

where ‘1689’ and ‘1577’ is the revision from the store (therefore also works for unasserted installs with ‘xN’). A symlink could be used to help admins/developers/etc. If adding a symlink, would have to be careful to not reintroduce this issue by making snap-confine just look at the symlink :slight_smile:

If people go this direction, we’d need to consider how to handle the bpf cache and /var/cache/apparmor.

snap-confine then just uses the directory for the revision it is. Upon revert, snapd needs to reload the apparmor policy for the revision to preserve determinism (less of an issue today because it supports profile replace, but still).

I explored the profiles digest idea a little bit more this morning.

Some thoughts what we could do:

  • we create the profile digest from inputs like “seccomp: v2, apparmor: v1” and hash that, when we add more inputs or increase a version this naturally will result in new digests
  • we make snapd write profiles to dirs like /var/lib/snapd/seccomp/profile.<digest>/
  • we make snap-confine read the profiles from the above dir. snap-confine needs to know what <digest> supports:
  • we could duplicate that code (one time in C, one time in go)
  • we could make snapd write the profile-digest at startup to /run/snapd.profile-digest (we cannot use /var/lib/snapd/snapd.profile-digest or we run into the same race-condition at boot)
  • as long as the digest is not using any dynamic component (i.e. not uses the running kernel or similar) we could generate the digest at build time
  • the schema needs to be generalized for at least apparomor too

How about we take this a step further /var/lib/snapd/profiles/{digest}/{tag}.{kind}, e.g. /var/lib/snapd/profiles/{digest}/snap.hello-world.env.seccomp. We could use this to unify and clean up where the profiles are stored.

Apparmor has two options, first of all we can just load the profile we wish (the files on disk don’t matter) and second of all we could even teach snap-confine to load a specific versioned profile so that we can have multiple profiles “in flight” and freely choose between them without any race conditions.

Maybe? The apparmor_parser already does a reasonable job with this by knowing what it supports, what the kernel supports and what is in the policy. The only time this would be needed for apparmor would be if newer apparmor supported syntax that snapd used but the system that apparmor was on didn’t support the new syntax. This seems more like a dependency declaration on a newer apparmor though…

That’s pretty much the issue we are discussing above still. The general statement of the problem is that software changes and the format of data they use also changes in incompatible ways. That incompatibility may sometimes be observed going forward, when version N+1 doesn’t understand the data of version N, but it’s more frequently observed when going backwards, when version N doesn’t understand the data of version N+1, because it’s easy to do a migration forwards, but harder to it backwards or introduce completely new ideas while preserving a limited understanding for the old.

So, even if apparmor_parser knows what it supports today, at some point the apparmor profile will evolve, and the old tooling won’t support the new version of the profile anymore. If the snap holding the tooling is reverted into an old version, we need to be able to offer it a format that it can understand. That same problem may happen and will eventually happen with every external dependency that we work with (apparmor, seccomp, udev, etc).

To unblock the release we will revert the syntax/symbol changes in the seccomp code for now and do a new 2.26.4 release to unblock stable and to unblock 2.27.

1 Like

All reverts are in place now and a new 2.26.4 release is pushed to the beta channel of core. Once this gets QAed we can move it further to candidate and eventually stable. This unblocks the current issue. We also have a regression test for this in place now and plans for a better fix. But its more involving so we need to push those reverts out now.

1 Like

It turns out that 2.26.4 still contains an issue on revert from 2.24.1. This time with the bluez daemon, the root cause is https://github.com/snapcore/snapd/commit/173896270b2d57a51c8d046af73fa741d81d82da - there is a growing concern that reverting more of the interfaces seccomp code may introduce errors down the line. I think we should consider the seccomp-bpf branch (https://github.com/snapcore/snapd/pull/3431) instead of trying to revert bit by bit. This branch solves this particular problem by using different profile names in the new version. So a revert with that branch will cause the system to use the 2.24.1 generated profiles to be used.

I share mvo’s concern and agree we should consider the bpf branch which is undergoing active review now.

Jamie and I are working on https://github.com/snapcore/snapd/pull/3431 to unblock the revert. This PR fixes the immediate issue and also ensures that we won’t hit this issue going forward. There is more work (c.f. https://forum.snapcraft.io/t/versionized-profiles/) to solve all aspects of the problem but with the above PR we can move forward and will be safe for seccomp.

1 Like

The https://github.com/snapcore/snapd/pull/3431 branch is now merged and will be part of the next core snap build. @fgimenez - please run the nested suite with the core-revert test with the new code to verify that the branch actually fixes the revert issue we are seeing.

3 Likes

That was an epic PR, Michael! Thank you!

I did some testing with the nested suite now and found that we need a small extra PR: https://github.com/snapcore/snapd/pull/3517

On a refresh from the old snapd (pre-seccomp-bpf) to the new snapd
that uses the /var/lib/snapd/seccomp/bpf/*.bin code there is a race
condition on startup. When snapd starts it will create the new
security profiles. However all snap services (daemons) start in
parallel with snapd. So when a snap service like e.g. network-manager
runs it may not have a security profile yet. To fix this, snap-confine
needs to wait a little bit for the security profiles to appear.

We could make this wait conditional on uptime, i.e. only wait if the uptime is within the first 120 seconds or something.

1 Like