Segmentation fault running snapcraft on arch linux

I’m running snapcraft 3.6 built from the AUR package.

Any time I run anything that involves network access (e.g. login, register, push), snapcraft segfaults with no other message.

Any ideas?

Looks like python3 is segfaulting. This is what I found in the journal:

lip 03 13:10:06 galeon audit[21019]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=3 subj==snap.snapcraft.snapcraft (complain) pid=21019 comm="python3" exe="/var/lib/snapd/snap/snapcraft/3059/usr/bin/python3.5" sig=11 res=1
lip 03 13:10:06 galeon kernel: show_signal_msg: 54 callbacks suppressed
lip 03 13:10:06 galeon kernel: python3[21019]: segfault at 2260 ip 0000000000002260 sp 00007ffe5bb889c8 error 14 in python3.5[3ff000+1000]
lip 03 13:10:06 galeon kernel: Code: Bad RIP value.
lip 03 13:10:06 galeon systemd[1]: Started Process Core Dump (PID 21046/UID 0).
lip 03 13:10:06 galeon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-coredump@2-21046-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
lip 03 13:10:07 galeon systemd-coredump[21047]: Process 21019 (python3) of user 1000 dumped core.
                                                
                                                Stack trace of thread 21019:
                                                #0  0x0000000000002260 n/a (n/a)

Top of the backtrace when it segfaults during snapcraft login:

(gdb) bt     
#0  0x0000000000002260 in ?? ()                                                                                                                                               
#1  0x00007f86b7e156ca in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#2  0x00007f86b7e157db in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#3  0x00007f86b7e1a8f2 in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#4  0x00007f86b7e15574 in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#5  0x00007f86b7e19db9 in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#6  0x00007f86b79615ad in ?? () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f86b7e15574 in ?? () from /snap/core/current/lib64/ld-linux-x86-64.so.2
#8  0x00007f86b7961664 in __libc_dlopen_mode () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#9  0x00007f86b7946fae in ?? () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f86b7947748 in __nss_lookup_function () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#11 0x00007f86b7909bb8 in ?? () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#12 0x00007f86b790cd5e in getaddrinfo () from /snap/core/current/lib/x86_64-linux-gnu/libc.so.6
#13 0x00000000005f29a4 in ?? ()                                                        
#14 0x00000000004ea137 in PyCFunction_Call ()
#15 0x0000000000536d94 in PyEval_EvalFrameEx ()   
#16 0x000000000053fc97 in ?? ()      
#17 0x000000000053b83f in PyEval_EvalFrameEx ()
#18 0x0000000000540b0b in PyEval_EvalCodeEx ()   

Looks network related. So I tried running this in snap run --shell snapcraft:

$ $SNAP/usr/bin/python3 -c 'print("foo")'
foo
$ $SNAP/usr/bin/python3 -c 'import socket; socket.getaddrinfo("google.com", 80)'
Segmentation fault (core dumped)

Thanks to @zyga-snapd’s suggestion, I tried classic-snap-analyzer, which flagged /usr/lib/locale/locale-archive. The file comes from the host.

It’s also clearly mmapped in gdb output:

(gdb) info proc mappings
...
       0x7fbcf06a3000     0x7fbcf07e3000   0x140000        0x0 
       0x7fbcf07e3000     0x7fbcf0dda000   0x5f7000        0x0 /usr/lib/locale/locale-archive

The host is using glibc 2.29.

If this really is related to locale I’d love to know that. This will impact our design of locale support.

Looked through glibc changes from 2.27 (shipped with core) to 2.29 (version on the host). I have not noticed anthing potentially breaking around locale-archive handling.

However, based on a hunch, I started tweaking /etc/nsswitch.conf since it does influence resolving hostnames, triggers dynamically loading nss backends and potentially talks to the host systemd/resolved.

The segfault went away when I edited /etc/nsswitch.conf like this:

...
# hosts: files mymachines myhostname resolve [!UNAVAIL=return] dns
hosts: files myhostname resolve [!UNAVAIL=return] dns
networks: files
...

Looks like there’s a potential issue with nss-mymachines compatibility. Potentially nss-resolve and nss-myhostname (also part of system) could be an issue too, but so far only mymachines seems to be breaking things.

Thanks! I can confirm that modifying my /etc/nsswitch.conf as above gets snapcraft working correctly :+1: