Disable seccomp audit logging for sys_bpf()

Hello,

I’m trying to make a snap of the scx-rustland scheduler, (scx/scheds/rust/scx_rustland at main · sched-ext/scx · GitHub).

Everything seems to work (almost), but in terms of performance it’s pretty bad and I see a lot of audit messages in the log triggered by seccomp audit, for example:

 [  543.282562] kauditd_printk_skb: 1056 callbacks suppressed
 [  543.282583] audit: type=1326 audit(1714288298.167:40342): auid=1000 uid=0 gid=0 ses=2 subj=snap.scx-rustland.scx-rustland pid=6864 comm="scx_rustland"
+exe="/snap/scx-rustland/x1/bin/scx_rustland" sig=0 arch=c000003e syscall=321 compat=0 ip=0x7762ebe3425d code=0x7ffc0000

Note that syscall=321 corresponds to sys_bpf() (that is used a lot by scx_rustland).

The overhead is so big that I’m even triggering the global sched-ext watchdog and the scheduler is kicked out:

 [  452.746587] sched_ext: BPF scheduler "rustland" errored, disabling
 [  452.746595] sched_ext: runnable task stall (kauditd[66] failed to run for 6.986s)

At the moment, as a workaround, I’m booting the kernel with audit=0 and with this option everything works perfectly fine, but I was wondering if there was a more fine-grained way to silent these messages (instead of disabling audit logging system-wide), i.e., using some special config / keyword in my snapcraft.yaml or something similar.

Any suggestion? Thanks in advance.

Does snappy-debug give any interface hint when you run it alongside your app with audit turned on ?

1 Like

Awesome!

...
* add one of 'network-control, system-trace' to 'plugs'.

snappy-debug was very useful, system-trace did the trick, thanks for the pointer @ogra !

I still need to run snap connect scx-rustland:system-trace manually, but IIUC I can’t enable auto-connect for this one, right?

1 Like

You can request it, the worst they can say is no.

Well, ignoring a few other pleasantries that come to mind ;).

The interface description suggests this would likely be something that requires a slightly higher review than the standard process (I.E proof from upstream they’re aware of the snap / are happy for you maintaining it) but given the nature of your snap and the fact that the overheads are so high that it becomes unusable without, I’d say you’ve good chances.

Being unfamiliar with BPF and especially this application of it, it might also be worth making a new interface entirely if this is a new class of snap that could fit the snap model well (I might be wrong, but it certainly feels like a new class to me!). Ultimately this would go through a completely separate security review and actually needs implementing which would take weeks/months, but if there’s a demand for it, it might be worth doing.

It’s not urgent for now, at the moment I’m simply experimenting the idea of providing multiple hot-pluggable Linux schedulers as snaps, because I think it’d be cool. :slight_smile: But they still require a sched-ext enabled kernel, that we don’t officially support yet (I’m testing this with my “unofficial” sched-ext kernel from ppa:arighi/sched-ext).

About introducing a new class, right now everything seems to work fine just with system-trace for me (at the end to interact with BPF we simply need to be allowed to use the bpf() syscall). Maybe, in the future, if more BPF technologies are used to replace “kernel components” we may want to have a more generic bpf-interface, or similar, to better represent this type of programs.

Did you test if network-control might perhaps be sufficient? It is way less privileged and easier granted.

1 Like

Just tested with nertwork-control and it seems enough, so I’ll switch to that for now, even if it sounds a bit counter-intuitive for a scheduler to require network control… at the end I just need to allow the bpf() syscall and read files from procfs.

Where can I find a detailed list of capabilities/syscalls that are allowed by these interfaces?

1 Like

If snappy-debug isn’t enough, you could always check the source:

2 Likes

Perfect, this is exactly what I needed! And now I know that I actually need the process-control interface, because scx-rustland may also call sched_setscheduler(), so network-control isn’t enough…

1 Like

Sounds like a new interface then. Feel free t propose it in the forum or open a PR to the snapd repository.

The reason we do not have an interface which allows just the bpf() syscall, is that the syscall in itself isn’t that useful. You usually need additional syscalls or path objects or network to act upon to attach our BPF objects. Then the seccomp log is supposed to let you know that the program is doing trying to do something it isn’t allowed to do under the currently active policy.

1 Like

Sure, I’ll send a PR to snapd. Potentially with more technologies relying on BPF struct_ops in the future we may have more tools/apps that just need access to the bpf() syscall and /sys/kernel/btf/vmlinux. And in this case enabling a whole network-control or process-control interface sounds a bit too overkill, so I think it’d be useful to have a more specific “bpf” interface.