Snapcraft (Python) segfault on Ampere A1 aarch64 + Oracle Linux 8

Hi,

I’m unable to use the snapcraft snap on Oracle Linux 8, on an Oracle Cloud A1.Flex aarch64 instance.

It looks like python3.8 from core20 is segfaulting on exec.

The same issue occurs with the snapcraft snap on the stable, 4.x and 5.x channels.

[root@instance-20211214-0641 ~]# uname -a
Linux instance-20211214-0641 5.4.17-2102.206.1.el8uek.aarch64 #2 SMP Wed Oct 6 17:35:01 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux

[root@instance-20211214-0641 ~]# snap version
snap    2.53.2-2.el8
snapd   2.53.2-2.el8
series  16
ol      8.4
kernel  5.4.17-2102.206.1.el8uek.aarch64

[root@instance-20211214-0641 ~]# snapcraft
Segmentation fault

[root@instance-20211214-0641 ~]# snap run --shell snapcraft

[root@instance-20211214-0641 ~]# /var/lib/snapd/snap/snapcraft/current/usr/bin/python3.8
Segmentation fault

[root@instance-20211214-0641 ~]# sysctl kernel.print-fatal-signals=1
kernel.print-fatal-signals = 1

[root@instance-20211214-0641 ~]# strace -v -ff /var/lib/snapd/snap/snapcraft/current/usr/bin/python3.8
execve("/var/lib/snapd/snap/snapcraft/current/usr/bin/python3.8", ["/var/lib/snapd/snap/snapcraft/cu"...], ["LS_COLORS=rs=0:di=38;5;33:ln=38;"..., "SNAP_USER_DATA=/root/snap/snapcr"..., "LANG=en_US.UTF-8", "HISTCONTROL=ignoredups", "HOSTNAME=instance-20211214-0641", "SNAP_REVISION=6956", "SNAP_ARCH=arm64", "SNAP_INSTANCE_KEY=", "SNAP_REAL_HOME=/root", "S_COLORS=auto", "SNAP_USER_COMMON=/root/snap/snap"..., "USER=root", "PWD=/root", "HOME=/root", "SNAP=/snap/snapcraft/6956", "SNAP_COMMON=/var/snap/snapcraft/"..., "SNAP_NAME=snapcraft", "XDG_DATA_DIRS=/usr/local/share:/"..., "SNAP_INSTANCE_NAME=snapcraft", "SNAP_DATA=/var/snap/snapcraft/69"..., "MAIL=/var/spool/mail/root", "SNAP_COOKIE=A-V9s9_3OmOd5RNE9jOe"..., "TERM=xterm-256color", "SHELL=/bin/bash", "SNAP_REEXEC=", "SHLVL=2", "PYLXD_WARNINGS=none", "LOGNAME=root", "SNAP_CONTEXT=A-V9s9_3OmOd5RNE9jO"..., "PATH=/snap/bin:/usr/local/sbin:/"..., "SNAP_VERSION=6.0", "HISTSIZE=1000", "SNAP_LIBRARY_PATH=/var/lib/snapd"..., "LESSOPEN=||/usr/bin/lesspipe.sh "..., "_=/usr/bin/strace"]) = -1 EINVAL (Invalid argument)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)

[  623.636808] potentially unexpected fatal signal 11.
[  623.639695] CPU: 0 PID: 40427 Comm: python3.8 Kdump: loaded Not tainted 5.4.17-2102.206.1.el8uek.aarch64 #2
[  623.643470] Hardware name: QEMU KVM Virtual Machine, BIOS 1.4.1 12/03/2020
[  623.646099] pstate: 60001000 (nZCv daif -PAN -UAO)
[  623.647909] pc : 0000fffd6aa73fc8
[  623.649233] lr : 0000aaad9c44633c
[  623.650408] sp : 0000ffffeb68c730
[  623.651605] x29: 0000ffffeb68c730 x28: 00000000ffffffff
[  623.653620] x27: 0000aaadc5dd2f80 x26: 0000aaadc5ee51e0
[  623.655656] x25: 0000aaadc5ddcd40 x24: 0000aaadc5ddc700
[  623.657728] x23: 0000aaadc5ee8ca0 x22: 0000000000000000
[  623.659691] x21: 0000aaad9c53ae34 x20: 0000000000000000
[  623.661687] x19: 0000aaad9c52f000 x18: 0000000000000030
[  623.664244] x17: 0000fffd6aa73fc0 x16: 0000aaad9c52ec70
[  623.666186] x15: 0000000000000040 x14: 0000000000000008
[  623.668207] x13: 0000000000000000 x12: 0000000000000000
[  623.670290] x11: 0000000000000000 x10: 0000000000000000
[  623.672350] x9 : 0000000000000000 x8 : 00000000000000dd
[  623.674344] x7 : 0000000000000000 x6 : 0000000000000000
[  623.676342] x5 : 0000ffffeb68c7f8 x4 : 0000000000000000
[  623.678349] x3 : 0000fffd6ac3f618 x2 : 0000aaadc5ddc700
[  623.680460] x1 : 0000aaadc5ee8ca0 x0 : ffffffffffffffea

I’m a little lost about how to continue to debug this. Register x8 is the syscall number I think, execve as expected, but how to figure out why execve is crashing is unclear to me.

The A1.Flex instance type can be spun up for free on Oracle Cloud, if that helps.

FWIW this affects the Certbot snap also, as it also uses python3.8 from core20.

Thank you!

1 Like

For anyone following along at home, I’ve replicated this with Oracle Linux 8 on an Ampere cloud instance on Oracle Cloud:

$ snap run --strace snapcraft init
[pid 41377] execve("/var/lib/snapd/snap/snapcraft/6956/bin/python", ["/var/lib/snapd/snap/snapcraft/69"..., "/snap/snapcraft/6956/bin/snapcra"..., "init"], 0x400007b860 /* 48 vars */ <unfinished ...>
[pid 41379] <... futex resumed>)        = ?
[pid 41379] +++ exited with 0 +++
[pid 41378] +++ exited with 0 +++
[pid 41380] +++ exited with 0 +++
/usr/bin/strace: Process 41381 attached
[pid 41381] +++ exited with 0 +++
<... execve resumed>)                   = -1 EINVAL (Invalid argument)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV +++
error: signal: segmentation fault

Edit: This is strange:

$ file /var/lib/snapd/snap/snapcraft/6956/usr/bin/python3.8
/var/lib/snapd/snap/snapcraft/6956/usr/bin/python3.8: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /snap/core20/current/lib/ld-linux-aarch64.so.1, for GNU/Linux 3.7.0, BuildID[sha1]=356168b7ac8384a8cec914d8e4fa9abf3b6b938e, stripped

$ ldd /var/lib/snapd/snap/snapcraft/6956/usr/bin/python3.8
        not a dynamic executable

We are getting further reports about this.

1 Like

Bump. This continues to be an issue for us.

Ah, yeah, you need to use the ldd script that matches the program loader set in python3.8, this should work:

/snap/core20/current/lib/ld-linux-aarch64.so.1 --list /snap/snapcraft/current/usr/bin/python3.8

Keep in mind that classic requires /snap to exist

Hi,

I just came across this issue with snap on AlmaLinux, also on OCI, while attempting to run the certbot Snap application - what can I do to get this problem solved permanently? Is there something particular about OCI and aarch64 that is caucing this issue?

Thanks, Dave.

I haven’t learned anything about the cause, but you can alternatively install Certbot from EPEL or using the pip instructions.

That’s what we did in the end. I found it curious that this only seems to happen on RHEL rebuilds, on aarch64 instances, on OCI. Would love to know what is special about that combination.

Thanks!

This might be fixed in the latest snapcraft; if you install snapcraft from stable and get the segfault, but later install from edge and don’t (for snapcraft itself); that means that rebuilding certbot with that release of Snapcraft would solve the issue