The PR just got merged, so at least 2.30 should work out of the box.
Moving forward, I hope to be able to keep the layer up to date with releases.
The PR just got merged, so at least 2.30 should work out of the box.
Moving forward, I hope to be able to keep the layer up to date with releases.
I built the layer on two different platforms. My laptop and a build server. On the laptop everything is fine, on the server, this error persists:
root@qemux86:~# snap install hello-world
error: cannot perform the following tasks:
- Mount snap "core" (3602) (exit status 127)
I found the reason for it. unsquashfs
was not finding liblz4, because it was built in a subdirectory of /usr/lib
. Adding the path to snapd’s environment would be a dirty fix, but fourtunately, there was already a patch in Yocto master, which fixes the problem. I just backported it to Yocto rocko and it got accepted in rocko-next.
The second issue, the null pointer dereference is still occuring, even though snap
works fine. journalctl -xe
shows the same issue, as in my first post. It occurs at an atomic read. If snap
is executed on qemux86-64, no null pointer dereference occurs and everything works fine. Maybe, on 32-bit qemu, Go’s atomic is not working and the resource is held by somebody else.
Thanks for sending the fixes for Rocko.
As for the backtrace I do see it in the journal. II’ll investigate a bit more.
Indeed the problem is observable for binaries built under Yocto, with Go 1.9, GOARCH=386
, GO386=387
, CGO_ENABLED=1
.
I could not reproduce the problem when building with my host Go (either 1.9 or 1.9.3) with cross compilation flags set. Funnily enough, even when I build using the toolchain that Yocto built.
So far I have found only paths that seem to fix/mask the issue. First one is disabling Go optmizations in snapd
recipe:
GOBUILDFLAGS_append = " -gcflags '-N'"
The second one is to build snapd
daemon statically. It’s enough to list it under STATIC_GO_INSTALL
in the recipe:
STATIC_GO_INSTALL = " \
${GO_IMPORT}/cmd/snapd \
${GO_IMPORT}/cmd/snap-exec \
${GO_IMPORT}/cmd/snap-update-ns \
"
The current recipe will also ignore ${STATIC_GO_INSTALL}
and uses hardocded list of binaries. I’ve fixed it here: https://github.com/morphis/meta-snappy/pull/15
On a side note, I had really hard time debugging Go binaries built under Yocto. It seems like DWARF produced by Go compiler does not play well with gdb
and I ended up getting
Cannot find DIE at 0x0 referenced from DIE at 0x10c [in module debugfs/usr/lib/go/pkg/linux_386_dynlink/libstd.so]
For comparison the same method works just fine for some random C binaries.
Edit: bumped Go version in Yocto to 1.9.3, same effect.
Edit2:
Reproduction steps:
SNAPD_DEBUG=1 /usr/lib/snapd/snapd
snap install hello-world
Breadcrumbs diff: https://paste.ubuntu.com/26482348/
Bog when with the backtrace fails: https://paste.ubuntu.com/26482304/
Note this:
2018/01/29 07:16:11.640734 task.go:248: -- in task 0x967d8780 set progress, state 0x9674d640
2018/01/29 07:16:11.641011 store.go:1616: -- finished, err: context canceled
2018/01/29 07:16:11.641340 progress.go:71: progress adapter--- &{task:0x967d8780 unlocked:true label:core total:8.0797696e+07 current:2.784677e+06}
2018/01/29 07:16:11.641850 task.go:248: -- in task 0xb77006fc set progress, state 0xe3f
!!!!!--- task pointer ^^^ changed from the last log
!!!!! now it's 0xb77006fc (clearly bogus), before 0x967d8780
!!!!!
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe6b pc=0x8275c5a]
goroutine 40 [running]:
github.com/snapcore/snapd/overlord/state.(*State).writing(0xe3f)
/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/state.go:140 +0x1a
github.com/snapcore/snapd/overlord/state.(*Task).SetProgress(0xb77006fc, 0x967ee700, 0x4, 0x4d0e000, 0x4d0e000)
/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/task.go:252 +0x1e9
Edit3
So far I’m naming -linkshared
link flag as the prime suspect. Investigating further:
b6442000-b7102000 r-xp 00000000 fd:00 1371 /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7102000-b7103000 ---p 00cc0000 fd:00 1371 /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7103000-b770f000 r--p 00cc0000 fd:00 1371 /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b770f000-b7753000 rw-p 012cc000 fd:00 1371 /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7753000-b7771000 rw-p 00000000 00:00 0
b7771000-b7774000 r--p 00000000 00:00 0 [vvar]
b7774000-b7776000 r-xp 00000000 00:00 0 [vdso]
bfa1b000-bfa3c000 rw-p 00000000 00:00 0 [stack]
The bogus address ends up being located in the rw-p
section of libstd.so
. This is a shared runtime library only enabled when building with -linkshared
.
Moving the snapd
and libstd.so
binaries to Ubuntu Artful i386 image I have observed the same segfault.
Adding -linkshared
is controlled by GO_DYNLINK
variable. Unfortunately it’s set through machine overrides in goarch.bbclass
:
GO_DYNLINK = ""
GO_DYNLINK_arm = "1"
GO_DYNLINK_aarch64 = "1"
GO_DYNLINK_x86 = "1"
GO_DYNLINK_x86-64 = "1"
GO_DYNLINK_powerpc64 = "1"
GO_DYNLINK_class-native = ""
Forcefully disabling it for x86 seems to do the trick. No more segfaults. Adding this to snapd
recipe file will disable -linkshared
for x86.
GO_DYNLINK_x86_remove = "1"
@PSGXerus can you try the above on your setup?
Good work.
GO_DYNLINK_x86_remove = "1"
solves the issue.
I also used your newest branch from github.
I’m curious what the problem with libstd.so
is though.
After all, thanks for your help. I think with this fix, the meta-snappy
layer is ready to go again.
There is some bug or other in PIC generation on 386. I don’t have a non-enormous test case though.
I’ve managed to reproduce the problem with snapd built from source in a Xenial cloud image. Iterating through a couple of Go versions, 1.9 is the first release that introduce the breakage. 1.8.6 is the last one that works.
I’ve bisected the range go1.8.6
to go1.9
. The first bad commit is:
https://github.com/golang/go/commit/4808fc444307fa683bf3df6d55f9ad1828891a36
@mwhudson does this make any sense to you?
Reproduction steps:
grab a 386 build
build a shared libstd.go:
go install -x -v -buildmode=shared -linkshared std
build snapd, use -linkshared
:
go install -x -v -linkshared github.com/snapcore/snapd/cmd/snapd
double check snapd is linked with libstd.so:
ubuntu@ubuntu:~/go/src/github.com/snapcore/snapd$ ldd /home/ubuntu/go/bin/snapd |grep libstd.so
libstd.so => /home/ubuntu/goroot/go/pkg/linux_386_dynlink/libstd.so (0xb648e000)
start snapd:
sudo SNAPD_DEBUG=1 /home/ubuntu/go/bin/snapd
run snap install:
sudo snap install hello
once the download (actual download, with progress bar and transfer speed) starts, hit ^C
I’ve used snapd commit 3a40b94
I’ve pushed a commit disabling GO_DYNLINK on x86 to https://github.com/morphis/meta-snappy/pull/15.
No, not even slightly but I can have a deeper look, thanks for doing the bisect.
I’ve opened a PR to update snapd to 2.31:
https://github.com/morphis/meta-snappy/pull/16
Once it’s merged I’ll update the docs page.
Docs PR:
Thanks for the update and merge!