In the latest version of my snap WPE WebKit Mir Kiosk (2.30.2), I’m facing a segfault at program startup during Wayland initialization. This is only on the armhf build, the amd64
build from the identical yaml works fine on a Core PC installation (gadget pc 18-2
r104, pc-kernel 4.15.0-122.124
r625). So I guess it’s either a subtle problem during compilation or something else isolated to that target architecture.
I pushed that faulty build (rev 51) to the edge channel so that others may reproduce the error, but please note that it will segfault right at service startup. If you have an amd64 Core device with mir-kiosk
running, rev 50 on amd64/edge should work fine.
I’m neither super-comfortable with C/C++ nor its debugging, so I hope that someone here can point me in the right direction to get this running.
Build
- Built with snapcraft 4.3 from this snapcraft.yaml natively on a Raspberry Pi 4 w/ 4GB RAM + 4GB swap, vanilla core18 gadget image, inside an LXD container according to @ogra’s tutorial.
- I can’t use Launchpad because of this limitation.
- Cross-building is not an option right now, though WPE maintainers strongly recommend that – but use buildroot/yocto.
- Neither is pulling in the pre-built WPE components from Ubuntu archives, as a) they’re only available in focal, for which there’s no gnome snapcraft extension yet, and b) they’re already outdated. Using the Debian binaries would probably also wreak havoc.
- As you can see in the snapcraft.yaml, I use gcc8/g++8 as bionic’s default gcc7 threw a compiler error; WPE maintainers advised to use gcc8.
Debugging results so far
environment
Test installation on several Raspberry Pi’s, one model 4 and one 3B. Both running core18 images with mir-kiosk 2.1.0-snap103
(latest stable).
Debugging flags: G_MESSAGES_DEBUG=all LIBGL_DEBUG=verbose WAYLAND_DEBUG=1
user@pi:~$ snap version
snap 2.47.1
snapd 2.47.1
series 16
kernel 5.3.0-1036-raspi2
service log
2020-11-02T16:30:50Z systemd[1]: Started Service for snap application wpe-webkit-mir-kiosk.browser.
2020-11-02T16:30:52Z -[10502]: platform_setup: Platform name: fdo
2020-11-02T16:30:52Z -[10502]: platform_setup: Platform plugin: libcogplatform-fdo.so
2020-11-02T16:30:52Z -[10502]: Initializing Wayland...
2020-11-02T16:30:52Z wpe-webkit-mir-kiosk.browser[10323]: [1158415.802] -> wl_display@1.get_registry(new id wl_registry@2)
2020-11-02T16:30:52Z wpe-webkit-mir-kiosk.browser[10323]: [1158415.974] -> wl_display@1.sync(new id wl_callback@3)
2020-11-02T16:30:52Z wpe-webkit-mir-kiosk.browser[10323]: [1158416.262] wl_display@1.delete_id(3)
2020-11-02T16:30:52Z wpe-webkit-mir-kiosk.browser[10323]: /snap/wpe-webkit-mir-kiosk/51/bin/launch-wpe: line 29: 10502 Segmentation fault "$SNAP"/usr/bin/cog -P fdo --bg-color=black --enable-mediasource=1 --webprocess-failure=restart --enable-write-console-messages-to-stdout="$error_to_console" "$url"
2020-11-02T16:30:52Z systemd[1]: snap.wpe-webkit-mir-kiosk.browser.service: Main process exited, code=exited, status=139/n/a
2020-11-02T16:30:52Z systemd[1]: snap.wpe-webkit-mir-kiosk.browser.service: Failed with result 'exit-code'.
2020-11-02T16:30:52Z systemd[1]: snap.wpe-webkit-mir-kiosk.browser.service: Service hold-off time over, scheduling restart.
2020-11-02T16:30:52Z systemd[1]: snap.wpe-webkit-mir-kiosk.browser.service: Scheduled restart job, restart counter is at 15.
2020-11-02T16:30:52Z systemd[1]: Stopped Service for snap application wpe-webkit-mir-kiosk.browser.
snappy-debug
sudo journalctl --output=short --follow --all | sudo snappy-debug
brings up nothing while running sudo snap run wpe-webkit-mir-kiosk.browser
in a second terminal. Also, I guess confinement issues would appear in the amd64 version as well.
strace
sudo snap run --strace wpe-webkit-mir-kiosk.browser
(starting from the message “Initializing Wayland” which indicates things are working until here, full strace here since it’s > 17MB)
expand strace
(cog:4119): Cog-FDO-DEBUG: 10:06:58.080: Initializing Wayland...
[pid 4119] write(1, "(cog:4119): Cog-FDO-\33[1;32mDEBUG"..., 85) = 85
[pid 4119] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 11
[pid 4119] connect(11, {sa_family=AF_UNIX, sun_path="/run/user/0/snap.wpe-webkit-mir-kiosk/wayland-0"}, 50) = 0
[pid 4119] write(2, "[1295381.194] -> wl_display@1.g"..., 44[1295381.194] -> wl_display@1.get_registry() = 44
[pid 4119] write(2, "new id wl_registry@", 19new id wl_registry@) = 19
[pid 4119] write(2, "2", 12) = 1
[pid 4119] write(2, ")\n", 2)
) = 2
[pid 4119] write(2, "[1295382.985] -> wl_display@1.s"..., 36[1295382.985] -> wl_display@1.sync() = 36
[pid 4119] write(2, "new id wl_callback@", 19new id wl_callback@) = 19
[pid 4119] write(2, "3", 13) = 1
[pid 4119] write(2, ")\n", 2)
) = 2
[pid 4119] sendmsg(11, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1\0\0\0\1\0\f\0\2\0\0\0\1\0\0\0\0\0\f\0\3\0\0\0", iov_len=24}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 24
[pid 4119] poll([{fd=11, events=POLLIN}], 1, -1) = 1 ([{fd=11, revents=POLLIN}])
[pid 4119] recvmsg(11, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\34\0\1\0\0\0\7\0\0\0wl_drm\0\0\2\0\0\0\2\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 400
[pid 4119] write(2, "[1295385.844] wl_display@1.delet"..., 37[1295385.844] wl_display@1.delete_id() = 37
[pid 4119] write(2, "3", 13) = 1
[pid 4119] write(2, ")\n", 2)
) = 2
[pid 4119] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x5} ---
[pid 4130] <... poll resumed> <unfinished ...>) = ?
[pid 4129] <... futex resumed>) = ?
[pid 4128] <... poll resumed> <unfinished ...>) = ?
[pid 4127] <... poll resumed> <unfinished ...>) = ?
[pid 4126] <... poll resumed> <unfinished ...>) = ?
[pid 4125] <... futex resumed>) = ?
[pid 4124] <... futex resumed>) = ?
[pid 4129] +++ killed by SIGSEGV +++
[pid 4130] +++ killed by SIGSEGV +++
[pid 4128] +++ killed by SIGSEGV +++
[pid 4127] +++ killed by SIGSEGV +++
[pid 4126] +++ killed by SIGSEGV +++
[pid 4125] +++ killed by SIGSEGV +++
[pid 4124] +++ killed by SIGSEGV +++
[pid 4119] +++ killed by SIGSEGV +++
<... wait4 resumed> [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], 0, NULL) = 4119
rt_sigaction(SIGINT, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xb6e5e751}, {sa_handler=0x470705, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0xb6e5e751}, 8) = 0
openat(AT_FDCWD, "/usr/share/locale/C.UTF-8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/C.utf8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/C/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C.UTF-8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C.utf8/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C/LC_MESSAGES/bash.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat64(2, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
openat(AT_FDCWD, "/usr/share/locale/C.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/C.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/C/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/C/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "/snap/wpe-webkit-mir-kiosk/51/bi"..., 250/snap/wpe-webkit-mir-kiosk/51/bin/launch-wpe: line 29: 4119 Segmentation fault "$SNAP"/usr/bin/cog -P fdo --bg-color=black --enable-mediasource=1 --webprocess-failure=restart --enable-write-console-messages-to-stdout="$error_to_console" "$url"
) = 250
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=4119, si_uid=0, si_status=SIGSEGV, si_utime=10, si_stime=41} ---
wait4(-1, 0xbe9d2bfc, WNOHANG, NULL) = -1 ECHILD (No child processes)
sigreturn({mask=[]}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
exit_group(139) = ?
+++ exited with 139 +++
error: exit status 139
Note that the “no such file or directory” errors for libc.mo at the end appear after the segfault, probably trying to localize the error message. Over in SIGSEGV (Address boundary error) - #5 by mmartinortiz , it was just a missing library, but I don’t see any library lookup errors in the strace right before the crash.
I also tried to probe it with gdb, but it’s a Release
build, and the build with debug symbols is still running As this error only occurs on armhf, I used the --experimental-gdb-server
variant described in the docs, with gdb-multiarch
on an Ubuntu 20.04 amd64 machine. Works and connects, but without debug symbols it’s not useful.