Debian core configure hang on first snap install

With a fresh debian-9 system the following will work:

$ sudo snap install core
$ sudo snap install hello

however:

$ sudo snap install hello

which implicitly installs “core” will hang forever in the “configure” state of core.

The reason for this behaviour is the following (best hypothesis right now):

  • snapd in 2.21 sets up the install of hello and core
  • the 2.21 version will set “devmode: true” on devmode distros like debian in snap-setup on the explicit snap
  • when “core” is the explicit snap that gets @complain in the seccomp profile hence no confinement
  • when “hello” is the explicit snap that gets @complain in the seccomp profile but core gets the normal seccomp confinement
  • the “normal” seccomp confinement is lacking the “bind()” syscall which triggers the kill of snapctl
  • in 2.26 (which is in stable) we had no “bindSyscallWorkaround” - so once 2.27 gets released the hanging configure hook should be fixed
2 Likes

Ouch. :worried:

Thanks for looking into this, Michael.

1 Like

Some more data:

When I snap install hello I get a hanging configure hook. I can then:

  • duplicate the env of the defunct snapctl (in my case pid 3824) via: for i in $(xargs --null echo < /proc/3824/environ ); do export $i; done
  • with that env I can run: /snap/core/current/usr/lib/snapd/snap-confine snap.core.hook.configure /usr/lib/snapd/snap-exec --hook=configure core which is exactly what the configure hook is doing. forkstat helps getting the exact commandline.
  • with the above env I can reproduce the exact behaviour and see a snapctl process (but ps -eLf shows two alive threads for snapctl)
  • running the above with strace shows that the process that goes defunct (in my case PID 3872): [pid 3872] bind(3, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = ? - no trace of SIGSYS on 3872
  • the /var/lib/snapd/seccomp/bpf/snap.core.hook.configure.src config has:
# This is an older interface and single entry point that can be used instead
# of socket(), bind(), connect(), etc individually.
socketcall
  • when looking at forkstat() I can see that: 22:21:48 exit 3872 31 0.027 snapctl get service.ssh.disable so SIGSYS got delivered to snapctl (the 31 in there is SIGSYS)
  • same behaviour on debian with kernel 4.11

The forkstat results of a bad run: http://paste.ubuntu.com/25265183/ and a good run http://paste.ubuntu.com/25265184/

1 Like

When comparing the two generated seccomp profile we see:

  • good: contains @complain at the header(!)
  • bad: no @complain mode

And indeed, the snap install core has a “snap-setup” with “devmode: true” in the state.json, and this is missing in the snap install hello run for core, however devmode: true is set for the “hello” snap. So it looks like snapd 2.21 will set “dev-mode” in snap-setup (this snap-setup is generated from the old 2.21 snapd it is generated before the restart into 2.26.14 is done) because in these (older) days, that is what was done by default on ForceDevMode distros (and indeed when I set SNAP_REEXEC everything I install is devmode with 2.21).

So we need to figure out what syscall is killing snapctl. Unfortuantely on debian neither 4.9 nor 4.11 will show anything in dmesg. Plus we need to get to the bottom of why snapd 2.21 sets devmode: true in snap-setup and if we can detect/fix that (once we found/fixed what syscall is missing on debian).

Looking at the seccomp profile, it appears that the “bindSyscallWorkaround” snippet is missing in the generated hook seccomp profile. This then causes the seccomp kill on bind().

Copying from the first message, just to attribute a solution: in 2.26 (which is in stable) we had no “bindSyscallWorkaround” - so once 2.27 gets released the hanging configure hook should be fixed.

Just a reminder that the fix for this hasn’t been rolled out yet: https://bugs.launchpad.net/snappy/+bug/1674193

2 posts were split to a new topic: Ld.so error on debian on raspberry pi