Make snapd conditionally add 'bind' to the seccomp profile if an LSM (eg, apparmor) is not in use and the kernel doesn't support seccomp EPERM+audit (upcoming)
Capturing the IRC conversation and ideas for posterity since various parts of this conversation come up from time to time.
- snap-confine necessarily sets the seccomp to use KILL because only KILL logs policy violations
bind syscall is (intentionally) not in the default seccomp template
- Go unconditionally uses the
bind syscall as part of its setup for using the snapd socket. It could be patched to not do this, but getting those patches out to the distributions would not be timely. I'm told newer Go also fixes the issue, but getting newer Go out to the distributions suffers from the same issue.
- On systems with AppArmor enabled, AppArmor denies an access that makes Go take a different code path and the
bind syscall is not attempted and the application is not KILLed
- On systems without AppArmor enabled, Go attempts the
bind and the program is immediately KILLed if the application doesn't 'plugs' something with the bind syscall (eg, 'network-bind')
When considering how to fix this, it is important to understand seccomp has a number of limitations:
- seccomp is unable to dereference userspace pointers, so we can't argument filter on the
sockaddr struct in the
bind syscall (eg, in an attempt to allow bind to the loopback). This is a kernel limitation
- unlike AppArmor, seccomp does not support profile transitions across exec() or otherwise. This means there is no way to launch snapctl under a different seccomp profile from the application launching it. This is an upstream design choice
- seccomp profiles can be changed, but they can only be made more restrictive, so you can't start an application without access to
bind and then add it later for snapctl. This is an upstream design choice since this would allow sandbox escape
- seccomp policy violations are not logged when using SECCOMP_RET_ERRNO which makes its use impractical at this time since it would break the snap developer's bootstrap cycle, make debugging difficult, etc. This is actively being worked on and soon upstream will change to allow logging with SECCOMP_RET_ERRNO
Altogether this means that there are the following options:
bind to the default template
bind conditionally when apparmor is not used and the seccomp policy is KILL
c. adjust snapctl to not make these calls
d. don't allow using seccomp when AppArmor(/LSM) is unavailable
'a' should not be used as it weakens the policy for systems with AppArmor enabled. Similarly, 'd' should not be used because it weakens the sandbox on systems where AppArmor is disabled (though the seccomp sandbox alone should not be considered strong confinement). 'c' might be possible if calling out to C (or possibly jumping through other hoops), but this introduces complexity to snapctl. 'b' is simple to understand and should be easy to implement (indeed, something similar is already happening for adding policy when using
snap try with ecryptfs).
All of the above was discussed on IRC and it was decided that we should use 'b'.
This problem eventually just goes away long term because newer Go (I'm told) will not call bind with how snapctl is coded. Seccomp EPERM+audit will be available in newer kernels (or be patched into existing kernels) and snap-confine will start using this if the kernel supports it, so with older Go, it turns into an EPERM instead of a KILL. AppArmor will at some point gain fine-grained network mediation and its policy should be able to handle things such as (bind only to loopback, and similar) and other LSMs should be able to do the same.