as some might now we still have the reoccuring problem the the bind() syscall snapctl issues as part of the golang net package probing for IPv4 and IPv6 support. This isn’t a problem on platform where we have support for AppArmor as AppArmor. On platforms like Fedora or openSUSE where we don’t have AppArmor but only Seccomp this actually lets all hooks fail as soon as they start using snapctl. This is true for all snaps but not the core one as we now plug network-bind for it.
I’ved chatted a bit with @zyga-snapd but there seems to be no easy way out of this.
modifying golang to not probe IPv4 / IPv6 is not an option as this would mean we need to land a patch into any distribution in one of the core packages.
running snapctl with a different seccomp profile also seems to be not possible as once a seccomp profile is setup there is not way to change for a sub-process to another profile.
allowing bind(), even with seccomp argument filtering, seems to be dangerous. But requires somebody from the security team to comment on.
a network namespace would be one way out but not easy to implement given the existing confinement setup.
Happy to hear any suggestions on what we can do to solve this problem!
One good thing is that in the latest golang sources the relevant part in the net package was refactored and the IPv6/IPv4 detection code is now executed on demand and not when the package is loaded the first time. Bad thing is that this doesn’t help us to support any of our distributions which ship with older golang versions.
Make snapd conditionally add ‘bind’ to the seccomp profile if an LSM (eg, apparmor) is not in use and the kernel doesn’t support seccomp EPERM+audit (upcoming)
Discussion
Capturing the IRC conversation and ideas for posterity since various parts of this conversation come up from time to time.
snap-confine necessarily sets the seccomp to use KILL because only KILL logs policy violations
the bind syscall is (intentionally) not in the default seccomp template
Go unconditionally uses the bind syscall as part of its setup for using the snapd socket. It could be patched to not do this, but getting those patches out to the distributions would not be timely. I’m told newer Go also fixes the issue, but getting newer Go out to the distributions suffers from the same issue.
On systems with AppArmor enabled, AppArmor denies an access that makes Go take a different code path and the bind syscall is not attempted and the application is not KILLed
On systems without AppArmor enabled, Go attempts the bind and the program is immediately KILLed if the application doesn’t ‘plugs’ something with the bind syscall (eg, ‘network-bind’)
When considering how to fix this, it is important to understand seccomp has a number of limitations:
seccomp is unable to dereference userspace pointers, so we can’t argument filter on the sockaddr struct in the bind syscall (eg, in an attempt to allow bind to the loopback). This is a kernel limitation
unlike AppArmor, seccomp does not support profile transitions across exec() or otherwise. This means there is no way to launch snapctl under a different seccomp profile from the application launching it. This is an upstream design choice
seccomp profiles can be changed, but they can only be made more restrictive, so you can’t start an application without access to bind and then add it later for snapctl. This is an upstream design choice since this would allow sandbox escape
seccomp policy violations are not logged when using SECCOMP_RET_ERRNO which makes its use impractical at this time since it would break the snap developer’s bootstrap cycle, make debugging difficult, etc. This is actively being worked on and soon upstream will change to allow logging with SECCOMP_RET_ERRNO
Altogether this means that there are the following options:
a. add bind to the default template
b. add bind conditionally when apparmor is not used and the seccomp policy is KILL
c. adjust snapctl to not make these calls
d. don’t allow using seccomp when AppArmor(/LSM) is unavailable
‘a’ should not be used as it weakens the policy for systems with AppArmor enabled. Similarly, ‘d’ should not be used because it weakens the sandbox on systems where AppArmor is disabled (though the seccomp sandbox alone should not be considered strong confinement). ‘c’ might be possible if calling out to C (or possibly jumping through other hoops), but this introduces complexity to snapctl. ‘b’ is simple to understand and should be easy to implement (indeed, something similar is already happening for adding policy when using snap try with ecryptfs).
All of the above was discussed on IRC and it was decided that we should use ‘b’.
Longterm
This problem eventually just goes away long term because newer Go (I’m told) will not call bind with how snapctl is coded. Seccomp EPERM+audit will be available in newer kernels (or be patched into existing kernels) and snap-confine will start using this if the kernel supports it, so with older Go, it turns into an EPERM instead of a KILL. AppArmor will at some point gain fine-grained network mediation and its policy should be able to handle things such as (bind only to loopback, and similar) and other LSMs should be able to do the same.
Thanks a lot for the insights, Jamie. Your suggestion sounds great.
We really shouldn’t let seccomp kill processes like this if we have a choice of simply returning EPERM, but considering we have a fix upcoming, we can wait I suppose.
The problem is that applications fail and there is nothing in the logs to diagnose why. It isn’t that KILL is desirable, it is that it has been the least worst option for the general case (this topic notwithstanding). KILL is a poor experience which is why @tyhicks has been working with upstream to get this resolved. The plan is that snap-confine will interrogate the kernel to see if EPERM+audit is supported and use it if it is, otherwise fallback to old behavior.