Versionized profiles

backlog
mvo

#7

Interface developers typically will modify the policy on the system in /var/lib/snapd/*/profiles directly when developing the policy. The digest approach complicates that, but not in an insurmountable way. It does mean we need to document the new way so people can see where the profiles are. It only affects admins and users who care about the policy on the system, but again, not in an insurmountable way. The point of my comment on this is to capture what needs documenting.

There is the “We need to make snap-confine wait for some amount of time and warning about it if its required profile is not yet in place” idea-- this puts snapd in the critical path. I put forth one way to get out of the critical path in addition to @mvo’s proposal, snapd-confine could itself strategically call snap-generate-profiles or some other appropriate tool that doesn’t rely on snapd running when the profile isn’t there, so you don’t have a timeout and services can start without snapd.

My point is what is missing in all other views of the problem. The quoted ‘general view’ only talks about ‘format data’ but it isn’t just about parseable policy; it is also about the content of the policy. Consider a rule is added in a new version of snapd that fixes a security issue in the policy where that rule is parseable in previous revisions. We want to rev the digest input there so that snaps don’t race and run with the old policy.

Yes, I was summarizing why I consider the proposed change a good one. :slight_smile:

I don’t quite understand this comment. @mvo is certainly aiming for this in the proposal, and I agree with the proposal (with a few additional details), and we can certain expand on the digest input as we see fit in the future… What’s the problem?

Perhaps you are confused because we were responding at the same time and it seemed like I was countering you? Regardless, I think this approach does add complexity but, again, in a way that IMO is acceptable.


#8

They would be able to do that in /var/lib/snapd/profiles/current/*. Should be just a different path.

Once more, no, it does not put snapd in the critical path. Let me simplify so the point is more clear: assume the system boots and we need profiles that are not in place to handle the kernel that was chosen out of band. Something somewhere somewhen needs to generate those profiles for these applications to work at all. We need to solve that.

Sorry, I’m probably still missing your point then. Today we already generate new profiles when snapd changes. Isn’t that sorted already? The problem we’re addressing now is the fact these profiles may be completely incompatible on a revert, and everything in the system would suddenly stop working before snapd has a chance to fix them.

Your earlier point commented above the content you quoted was that if we generated profiles every time everything would be beautiful and stable. My point is that no, it won’t be beautiful and stable because changes may require the system to suddenly use a different set of profiles which are not available.

There’s not much of “me” in this conversation. :slight_smile: We’re just exploring the problem and trying to come up with good solutions. I don’t think we’re there yet.


#9

Historically that has been snapd. The proposal from @mvo tries to address that and does except for when the profiles don’t yet exist and there was talk of “waiting”. It wasn’t clear what we were waiting on-- since snapd historically generates the profiles, it sounded like waiting on snapd. If that is not what you meant by waiting, then fine, it isn’t in the critical path. If it is waiting on snapd there, I gave an idea on how not to wait and just call the tool directly.

Yes, snapd regenerates the policy, but in a racy way since there is nothing today that enforces that snapd has regenerated the profiles before a snap daemon or non-daemon starts. This means that if in 2.x there was a policy flaw and 2.y has a policy fix for the flaw (that doesn’t constitute a syntax change), then because of the current race a reboot might allow some things to start before snapd gets around to regenerating the policy. This is wrong and my point all along has simply been that fixing that happens to fix the other issues. I was trying to express that this proposal can nicely address this race condition for both ‘daemon’ and ‘non-daemon’.

My point was meant to include (the ‘above’ intended to mean @mvo’s proposal) that things are beautiful and stable because we would wait for (or have snap-confine call out to generate) any unavailable profiles. Sorry if that wasn’t clear.

I fully agree that we haven’t quite fleshed this whole thing out and that in particular we need to consider the current directories and reverts better as well as digest reaping. I was expressing my thoughts in support of the general idea of using a digest, revving input into the digest in intelligent ways, and loading/using the digests to make sure we are using the right policy.

I can say that, I think that if we revert the mknod changes in 2.25 everywhere and if the digest is in 26, we will be ok if we continue to let /var/lib/snapd/{apparmor,seccom}/profiles be used by the old snapd/snap-confine (<=25) and new snapd/snap-confines (>= 26) always uses /var/lib/snapd/profiles/<digest>/{seccomp,apparmor}/. Eg (assume 25 doesn’t have digest support and 26 does):

  • 25 -> 26: snapd/snap-confine uses digest dirs and makes sure the policy is in place
  • 25 -> 26 -> 25: snapd/snap-confine uses old directories
  • 25 -> 26 -> 25 -> 27: snapd/snap-confine uses digest dirs and makes sure the policy is in place
  • 27 -> N>25: snapd/snap-confine uses digest dirs and makes sure the policy is in place

At some point, <=25 will rotate out and will not be able to revert to them since they will be garbage collected after enough refreshes. At that point the policy in the old directories should be reaped. If for some reason an older revision is locally (and assertedly) installed, it should all still work-- it will use the old directories for policy until enough new refreshes happen and it is reaped.


#10

I feel like we’re going in circles somewhat, so I’ll try to be brief: nothing we’re doing now puts something new into the critical path. If profiles are not in place, they must be generated, and that’s the status quo. The lack of that generation means things simply don’t work, and this needs to be fixed. Whether we call something to generate those profiles or that something runs by itself, we need to wait for them to be generated. Renaming snapd to not-snapd-but-generate-profiles doesn’t really change that. :slight_smile: We’ll continue to generate profiles ahead of time whenever we can, and the plan outlined above is incomplete because we need to nail down the question about out-of-date profiles. We’ll be exploring ideas further.


#11

I was saying when we try to fix this that I agree we should not put snapd, the service, in the critical path. The language around ‘waiting’ and my own previous thoughts on adjusting app snap systemd units to specify starting after snapd led me to think we might be adding snapd to the critical path.

I think it is entirely possible to generate the profiles ahead of time. I also think it is possible to generate missing profiles just in time (ie, snap-confine reaches out to some tool), which is an idea I was trying to capture that @mvo and I discussed in a hangout.

I agree we have been talking in circles somewhat and it’s clear to me now we are in alignment.


#12

This is not being a very productive conversation, Jamie. :slight_smile: No, we’re clearly not aligned, and as I explained multiple times these points misrepresent the issue at hand. We won’t be able to evolve the conversation further here, though.

As a hint of that, I explained above that if a kernel is updated out of band, snapd may easily be completely unable to update its profiles in time to cover the different kernel. Either it generates, or the applications will need to wait for the generation, or the applications will crash. So no, snapd won’t be able to generate profiles ahead of time in all situations, and that’s part of the problem.

Let’s have a call later in the week, please.


#13

Man we aren’t connecting today! Of course, a call is fine.

What I thought you meant was that a kernel being updated out of band was a future consideration where additional keys would be added to the digest input. Therefore, that can be handled by future iterations on the implementation of this proposal. Put more simply: anything we find interesting as part of the digest can be expressed there (how and what TBD) and the missing profiles for a new digest will need to be generated. Not by snapd (cause don’t want it in critical path) but instead by something else (which my snap-confine calling out to snap-generate-profiles idea could do, but there are other ways to do it too).

Again, a call is fine. Please invite @mvo.


#14

We made snapd a “Type=notify” service recently. This means that snapd will now communicate with systemd when it is fully initialized. And snapd is only sending the “READY=1” message to systemd after the security profiles are re-generated.

With that above change in place, I added a simple PR (https://github.com/snapcore/snapd/pull/3442) that uses this new feature. Now snap run will wait until snapd.service is in no longer in the ActivateState=activating state. Snapd will re-generate the security profiles on startup to ensure they match current snapds understanding, kernel etc. If snapd is not running at all (failed state/inactive state) snap run will just continue (we could make it fail in this case but that would mean if snapd ever has a bug and fails to start all applications will no longer work). The downside of this approach is that it will wait for all profiles.

The alternative to this wait-for-all profiles approach would be (as suggested by Gustavo) a helper snap-generate-profiles that snap-confine would call/wait-for for the specific snap that is about to be run if it detects that the profile is out-of-date. This approach is more elegant but it raises some questions:

  • how do we detect if the profile is outdated?
  • we need to read the state.json in order to generate the profiles. Zyga suggested it will be ok to just open it read-only and because we write it in an atomic way we will always have a valid snapshot

#15

well we have also the problem that snapd and the helper might try to generate profiles at the same time based on a different state. Unless the plan is to always just do this from snap run but then I don’t know if there are issues around trying to write profiles for different apps of the same snap for bits of the profiles that are one per-snap and not per-app.


#16

This sounds great, thanks!

As mentioned above, the advantage of having such an external tool is still nebulous, without a clear reason why doing that would be better than simply waiting for snapd. So far it still sounds like just one more moving piece increasing the complexity of the boot process, when snapd is doing that either way.

That alone is dangerous, as we’re putting snapd in the pipeline of every single boot for every single service packed as a snap. The more ubiquitous snaps become, the more dangerous this is. It’s also a bit unfortunate in the sense that it will slow down the boot process.

Instead, I suggest introducing a mechanism through which we detect whether we need to wait or not. We can discuss the details, but the idea is to have snap run or snap-confine quickly and independently figuring whether profiles are up-to-date, and only going into the waiting mode if that’s not the case. Since this will be a rather rare event, another advantage of doing this is that we can increase the amount of time we’re willing to wait for profiles to be sorted, since it’s more likely that proceeding without these profiles would mean breakage.


#17

This makes a lot of sense, thanks for these ideas. In order to detect profile changes we could go back to the original idea about a profile digest. When we write profiles we would also write something like /var/lib/snapd/profiles.digest that contains the digest of the current profiles. We would then introduce a new helper interfaces.ProfileDigest() which snap run calls. If that profile digest and the one in /var/lib/snapd/profile.digest are different, it knows it needs to wait, otherwise it can just continue. The helper would be very much like what was done in https://github.com/snapcore/snapd/compare/master...mvo5:profile-digest?diff=unified&expand=1&name=profile-digest - if that sounds sensible, I will expand #3442 accordingly.


#18

Something along those lines sounds sensible, but “profiles.digest” sounds a bit misleading. We’re not really talking about the digest of the profiles themselves, but rather about what environment produced them. Perhaps something like “system-key”, which is a digest produced from an aggregated view of the system. This would be built based on:

  • snapd build hash (same we use to decide to regenerate profiles)
  • core snap version revision (for REEXECing… what’s the relationship with the above today?)
  • kernel version
  • … what else?

Then, we might store it inside /var/lib/snapd/system-key whenever snapd finishes acknowledging the given system changes, whatever that means (doesn’t need to be constrained to profiles).

It’s worth noting that interfaces being connected and disconnected do not modify the system key, and that sounds desirable and another reason why “system key” might be appropriate. We don’t want to get “snap run” into the business of introspecting interface data to figure whether to wait or not. If the profile is out of date because a connection didn’t finish properly yet but the system key is the same, it should just go ahead and use the profiles that are already built. If the system key is the same, the profiles are compatible, and this is no different from performing a connection after the snap is installed.


#19

At the risk of prematurely optimizing, at some point (not necessarily today, but maybe today) we might prefer to consider interrogating the kernel via sysfs/proc at runtime for capabilities instead of simply the kernel version when generating “system-key”. On Ubuntu Touch we tried hard only to regenerate the AppArmor profiles only when we had to (ie, when the kernel capabilities for AppArmor changed or the profiles themselves changed) because on constrained armhf devices, depending on the profile, it could take up to 1 second to recompile the profile. With a system with 60 profiles (ie, snap app commands), that takes a while.

Put more simply, most kernel refreshes won’t require a profile regeneration and we can simply use the cache files and it would be nice if we didn’t needlessly regenerate if we can pull it off.

With the seccomp bpf precompilation it shouldn’t be nearly as expensive to regenerate (the profile syntax and what seccomp does is incredibly simple compared to apparmor), but with the upcoming changes to the kernel regarding seccomp complain mode we’ll be in the business of generating different seccomp policy based on kernel capabilities. Perhaps it makes sense to do that at runtime instead of kernel version as well?

I think apparmor_parser --version is needed to handle new policy syntax only available in newer apparmor where a new core snap has new apparmor and policy syntax and we revert to an older core snap that doesn’t.


#20

You’re right that there’s no reason to look at the kernel version. We should really be looking at real factors that will cause profiles to be generated differently. Even the apparmor_parser version would be irrelevant unless we’re actually using that as a factor to do something else at profile generation time. That’s not the case at least today, but we’ll probably need to do that once we support different and incompatible syntax. For that we also need to find a way to discover the version of apparmor_parser that doesn’t involve running it every time we “snap run” something.


#21

As for kernel interrogation, it certainly is possible on early boot to get the kernel capabilities we are interested in and put them in a file or hash for quick checks (along the lines of @mvo’s digest idea), but like you said, we need to now manage this file which adds complexity. Perhaps populating this file via a systemd one-shot early boot unit would make this robust-- but we’d have to consider cross-distro packaging to put that one-shot unit in place.

For apparmor_parser --version, I just remembered that /snap/core/current/usr/share/snappy/security-policy-version exists for similar reasons and it contains (along with some ubuntu-core-security-* cruft that should be removed) the apparmor version installed in the core snap. We’d have to read/parse that file but that is faster than an exec of apparmor_parser (@mvo’s digest could make that faster, but see above). Changes to this file don’t necessarily indicate syntax changes though-- indeed, it has the Ubuntu version and it will rev with SRUs, but there aren’t that many SRUs, so it may not be a problem.

Alternatively, if we decided the one-shot systemd early boot unit idea was the way forward, we could instead of looking at /snap/core/current/usr/share/snappy/security-policy-version run apparmor_parser --version as part of that one-shot.


#22

Oh and you are right that apparmor version is not currently needed; it is a future consideration for if we introduce new syntax that impacts snappy policy.


#23

The comment was really about apparmor_parser. Reading kernel capabilities should be faster than reading a file from disk.

But that won’t work unless inside a core device, right?


#24

Thanks for your input! I started implemented this now in: https://github.com/snapcore/snapd/pull/3456

It will generate a system-key string from the build-id and the apparmor-features of the running kernel. Its trivial to extend to new inputs as needed. The system-key looks currently roughly like this:

build-id: 7a94e9736c091b3984bd63f5aebfc883c4d859e0
apparmor-features:
- capability
- caps
- dbus

My plan (for now) is to just use this string directly (instead of hashing it) as it will making reasoning over /var/lib/snapd/system-key easier. Happy to hash it tough if that is preferable.

With that PR#3456 in place we can make the security profile re-generation conditional on the system-key: https://github.com/snapcore/snapd/pull/3460


The snapd roadmap
#25

I like this. apparmor-features is better than any implementation of apparmor version.


#26

Whoops, I miscommented. This is kernel capabilities, not userspace. It is fine if we skip apparmor version for now since we can add it to the system key when the time is right (there is no reason to add it now since there are no syntax changes in any series 16 revision).