Apparmor profile caching

That sounds like a pretty unfortunate choice.

Nice, at least we’re using a more reasonable place for them.

Now, we need to find out why on certain occasions snap-confine is getting the wrong profile loaded. That one profile does end up in /etc/apparmor.d because it’s provided by the package.

Perhaps, but it is a very old decision that makes things convenient for distros and administrators who just want to plop a file in /etc/apparmor.d and then have the system DTRT (ie, load it on boot, enable caching, etc). Applications who have their own ideas about profile loads are under no obligation to use /etc/apparmor.d.

This was initially the correct choice for ubuntu-core-launcher and early versions of snap-confine since it was installed as part of the system and there was only one profile on disk and the apparmor init boot process would figure out when to invalidate or use the cache easily enough. With all the re-exec work though, it may make sense to move snap-confine’s profile somewhere else and have snapd (or something else) load it. This could ensure that the correct profile is loaded and would avoid any distro-specific issues regarding when the apparmor initscript/systemd unit relative to snapd (distros should have apparmor load extremely early of course), but comes with its own set of challenges of making sure it is loaded before anything runs snap-confine.

Sorry for not being clear. It’s the caching in that place that sounds awkward. /etc is not a place to write down content that is not intended to be manipulated by people. Sounds fine to have /etc/apparmor.d itself as a place for security profiles.

Either way, you are right, that’s not a problem we need to solve in this context.

snap-confine is the first thing that executes in many cases. We can’t have snapd itself loading the profile for it on boot, and it seems to make sense that this is done by something from the system itself as it’s a general problem if one intends to use apparmor at all.

Oh yes, apparmor upstream has regretted that location for some time, but it is at least a decade old decision at this point and there is a lot of inertia at this point. :\

Yeah, that is why it wasn’t moved yet-- it isn’t a bad place for it to be now and if moved we would need another systemd unit/etc to make sure it was loaded, which begs the question why we are duplicating that work. I know there was some talk of moving the profile out of /etc/apparmor.d for other reasons, so thought I’d mention it.

Hey.

I wanted to open a new thread about this but then I noticed we already have one here:

@ogra and I just debugged this issue and we believe to have an explanation:

There are two separate issues here, well maybe three if you include the behavior of apparmor caching layer:

  1. The kernel doesn’t write timestamps (last write time, last mount time) to ext4 superblock if the filesystem is remounted as ro before being finally unmounted

This is what causes our ext4 superblock timestamps to effectively stay frozen in time. When the system shuts down the / is remounted ro and then unmounted. This means we don’t write those timestamps (just those in the superblock).

  1. On a device like the Raspberry Pi that don’t have a battery backed RTC we use the last mount time as initial time. Apart from the fact that this is a constant that doens’t change in our images this has another issue. It is actually older than last write time (once that timestamp functions correctly) and it may put the system “into the past” with regards to the timestamps on the apparmor profile source file and apparmor profile cache file. Depending on how the cache works this may cause us to load the cache even when it is stale. The fixrtc script should use the last write time, falling to last mount time if the former is unset.

  2. The cache system should be content based, not date based but we already know that.

For reference. Ext4 superblock write logic: https://github.com/torvalds/linux/blob/ca04b3cca11acbaf904f707f2d9ca9654d7cc226/fs/ext4/super.c#L4813

This was tested on a Fedora 29 VM with a loopback device and ext4. Remounting the filesystem as read-only before unmounting the filesystem results in wrong data in the superblock (timestamps are not updated to reflect reality, e.g. written files on disk are more recent than “last write time”). It was also tested on a raspberry pi running ubuntu core and a 4.4 kernel when loading an apparmor profile without considering the cache resulted in correct behavior but rebooting the system loaded stale cache entry. It was also tested there by observing the output of “dumpe2fs” on the root filesystem. It shows that the system perpetually believes it is the moment the image was first created (when the filesystem was formatted).

https://paste.ubuntu.com/p/pJqTTZzYkg/ (the log was taken on on 24th of July, the dates refer to 9th of July due to the ext4 superblock behavior)

2 Likes

So some more thoughts on this bug. On a Raspberry Pi we have no clock so on first boot we just set the time to the moment when the ext4 on /writable was first created. Then we mount / and carry on. This sets the last-mount-time of / to a few seconds after the image was created. Then we reboot and we set the time to last-mount-time of / which is really very close to the time when the system was created… see the problem? This will continue forever, “slowly” ticking seconds between each boot. This explains the pastebin above where the time is roughly 12 minutes after the moment when the filesystem was created (after about two weeks of real time and 18 reboot cycles).

I’m looking into apparmor caching logic now. I will update this answer when I have more data.

1 Like

Just a comment here: https://github.com/zyga/snapd/commit/b186de2e3c9285eec1b121bc1f47621a67b8d6fc

1 Like

This affects refreshes from older version to 2.34.2 so we may hold the release.

1 Like

The fixrtc logic is clearly wrong. It was stated elsewhere that on these devices the clock can only be reasonably expected to be accurate after network time is fetched. Why not simply adjust so that after the time is fetched, that date is stored off for fixrtc to use on the next boot?

The original purpose (back in 2011/12) was to prevent mount failures due to “last mount time is in the future” by forcing the clock to the filesystem creation (or if available, “last mount”) date.

see above … to read any “stored off data” you would have to have a filesystem mounted, the fixrtc script was originally designed to fix a mount problem, so it runs prior to mounting any filesystem.

@abeato has worked on a post-mount enhancement for it at https://paste.ubuntu.com/p/mNw7qKpbyx/ that could be merged (though it is very snap centric and would have to go into initramfs-tools-ubuntu-core instead of initramfs-tools itself or have to be made more generic).

Alternatively there is a “fake-hwclock” package in raspbian that could be pulled into core and used (i think that stores a timestamp on disk and simply gets started from an early systemd unit)

1 Like

I was perhaps a bit short on my response. It with the systemd remount-ro bit is wrong for it to be involved with relying on it to set something remotely close to the actual time.

I haven’t found the part that does this yet but it explains the lack of change to last-write-time.

Right this is why i propose to use alfonsos fixrtc enhancement, it simply does a “stat -c %Y /var/lib/snapd/snaps”, that will get you a pretty accurate date being the relative modification time of that folder against the epoch (this should definitely fix the apparmor cache issue, given the profile file comes with a snap that was installed in this folder).

1 Like

I now believe to fully understand the situation where a de-facto stale cache is used.

  • It is 2018, a core snap update triggers system reboot
  • it is 2016 now, because we don’t have a clock across reboot and fixrtc is broken
  • System startup code runs apparmor.service, this loads cached profiles because cache.mtime >= source.mtime - those profiles refer to snapctl at /usr/bin/snapctl
  • Service startup triggers system key regeneration - all profiles are re-written, referring to snapctl at /usr/lib/snapd/snapctl
  • apparmor_parser is used to load changed profiles but, because the cache files are from 2018 they are used without referencing updated source files (which date to 2016 since we don’t have good time yet)
  • NTP sync the time, it is 2018 again
  • Manually loading the profile (normally cache is not enabled so that part is avoided) “fixes” the problem ogra saw

I will prepare a minimal patch that only uses cache files if mtime matches exactly. This is correct since mtime of cache files is synchronized to source file mtime when a cache entry is written.

1 Like

Thinking about how to solve this issue knowing that there are multiple source files due to an include mechanism. I think we can keep the existing check (mtime of all the source files must be < mtime of the cache) and then add an additional check where we also verify that mtime of the 1st source file is identical to the mtime of the cache. Since the mtime of the cache is set to the mtime of the source this feels correct.

That won’t work in all cases (though would for many) since the parser will actually look at and use the newest mtime between the profile and any of its include files. Eg, if the base abstraction is newer than the profile, its mtime will be used. This might not be a problem in practice, but you would need to think through it.

1 Like

This step is not quite right if you are talking about snapd’s invocation of apparmor_parser. Since snapd will look at what is on disk and what it expects, if they are different, I would expect that snapd would call apparmor_parser without considering the cache (ie, this step should use --skip-read-cache) since it knows it needs to update the cache. However, if you are talking about the apparmor service, then yes, it will use the 2018 cache files since the time is off.

But snapd doesn’t use --skip-read-cache, I was describing the current behavior. We can obviously add --skip-read-cache to snapd but this would nullify all optimizations since we call apparmor_parser for all profiles even if they don’t change: https://github.com/snapcore/snapd/blob/master/interfaces/apparmor/backend.go#L346

1 Like

I seem to have forgotten we are doing that. I thought the whole point of EnsureDir was to avoid calling apparmor_parser. --skip-read-cache could be another thing to consider.