Yocto Rocko - "core" snap panic

The PR just got merged, so at least 2.30 should work out of the box.

Moving forward, I hope to be able to keep the layer up to date with releases.

1 Like

I built the layer on two different platforms. My laptop and a build server. On the laptop everything is fine, on the server, this error persists:

root@qemux86:~# snap install hello-world
error: cannot perform the following tasks:
- Mount snap "core" (3602) (exit status 127)

I found the reason for it. unsquashfs was not finding liblz4, because it was built in a subdirectory of /usr/lib. Adding the path to snapd’s environment would be a dirty fix, but fourtunately, there was already a patch in Yocto master, which fixes the problem. I just backported it to Yocto rocko and it got accepted in rocko-next.

The second issue, the null pointer dereference is still occuring, even though snap works fine. journalctl -xe shows the same issue, as in my first post. It occurs at an atomic read. If snap is executed on qemux86-64, no null pointer dereference occurs and everything works fine. Maybe, on 32-bit qemu, Go’s atomic is not working and the resource is held by somebody else.

Thanks for sending the fixes for Rocko.

As for the backtrace I do see it in the journal. II’ll investigate a bit more.

Indeed the problem is observable for binaries built under Yocto, with Go 1.9, GOARCH=386, GO386=387, CGO_ENABLED=1.
I could not reproduce the problem when building with my host Go (either 1.9 or 1.9.3) with cross compilation flags set. Funnily enough, even when I build using the toolchain that Yocto built.

So far I have found only paths that seem to fix/mask the issue. First one is disabling Go optmizations in snapd recipe:

GOBUILDFLAGS_append = " -gcflags '-N'"

The second one is to build snapd daemon statically. It’s enough to list it under STATIC_GO_INSTALL in the recipe:

STATIC_GO_INSTALL = " \
	${GO_IMPORT}/cmd/snapd		\
	${GO_IMPORT}/cmd/snap-exec		\
	${GO_IMPORT}/cmd/snap-update-ns		\
"

The current recipe will also ignore ${STATIC_GO_INSTALL} and uses hardocded list of binaries. I’ve fixed it here: https://github.com/morphis/meta-snappy/pull/15

On a side note, I had really hard time debugging Go binaries built under Yocto. It seems like DWARF produced by Go compiler does not play well with gdb and I ended up getting

Cannot find DIE at 0x0 referenced from DIE at 0x10c [in module debugfs/usr/lib/go/pkg/linux_386_dynlink/libstd.so]

For comparison the same method works just fine for some random C binaries.

Edit: bumped Go version in Yocto to 1.9.3, same effect.

Edit2:

Reproduction steps:

  • run this:
    SNAPD_DEBUG=1 /usr/lib/snapd/snapd
    
  • In separate shell:
    snap install hello-world
    
  • Press ^C when the download (actual download) starts

Breadcrumbs diff: https://paste.ubuntu.com/26482348/
Bog when with the backtrace fails: https://paste.ubuntu.com/26482304/

Note this:

2018/01/29 07:16:11.640734 task.go:248: -- in task 0x967d8780 set progress, state 0x9674d640
2018/01/29 07:16:11.641011 store.go:1616: -- finished, err: context canceled
2018/01/29 07:16:11.641340 progress.go:71: progress adapter--- &{task:0x967d8780 unlocked:true label:core total:8.0797696e+07 current:2.784677e+06}
2018/01/29 07:16:11.641850 task.go:248: -- in task 0xb77006fc set progress, state 0xe3f 
!!!!!---                              task pointer ^^^ changed from the last log
!!!!! now it's 0xb77006fc (clearly bogus), before 0x967d8780
!!!!! 

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe6b pc=0x8275c5a]

goroutine 40 [running]:
github.com/snapcore/snapd/overlord/state.(*State).writing(0xe3f)
	/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/state.go:140 +0x1a
github.com/snapcore/snapd/overlord/state.(*Task).SetProgress(0xb77006fc, 0x967ee700, 0x4, 0x4d0e000, 0x4d0e000)
	/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/task.go:252 +0x1e9

Edit3
So far I’m naming -linkshared link flag as the prime suspect. Investigating further:

b6442000-b7102000 r-xp 00000000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so           
b7102000-b7103000 ---p 00cc0000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7103000-b770f000 r--p 00cc0000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b770f000-b7753000 rw-p 012cc000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7753000-b7771000 rw-p 00000000 00:00 0                                 
b7771000-b7774000 r--p 00000000 00:00 0          [vvar]                 
b7774000-b7776000 r-xp 00000000 00:00 0          [vdso]                 
bfa1b000-bfa3c000 rw-p 00000000 00:00 0          [stack]             

The bogus address ends up being located in the rw-p section of libstd.so. This is a shared runtime library only enabled when building with -linkshared.

Moving the snapd and libstd.so binaries to Ubuntu Artful i386 image I have observed the same segfault.

Adding -linkshared is controlled by GO_DYNLINK variable. Unfortunately it’s set through machine overrides in goarch.bbclass:

GO_DYNLINK = ""
GO_DYNLINK_arm = "1"
GO_DYNLINK_aarch64 = "1"
GO_DYNLINK_x86 = "1"
GO_DYNLINK_x86-64 = "1"
GO_DYNLINK_powerpc64 = "1"
GO_DYNLINK_class-native = ""

Forcefully disabling it for x86 seems to do the trick. No more segfaults. Adding this to snapd recipe file will disable -linkshared for x86.

GO_DYNLINK_x86_remove = "1"

@PSGXerus can you try the above on your setup?

Good work.
GO_DYNLINK_x86_remove = "1" solves the issue.
I also used your newest branch from github.

I’m curious what the problem with libstd.so is though.

After all, thanks for your help. I think with this fix, the meta-snappy
layer is ready to go again.

There is some bug or other in PIC generation on 386. I don’t have a non-enormous test case though.

I’ve managed to reproduce the problem with snapd built from source in a Xenial cloud image. Iterating through a couple of Go versions, 1.9 is the first release that introduce the breakage. 1.8.6 is the last one that works.

I’ve bisected the range go1.8.6 to go1.9. The first bad commit is:
https://github.com/golang/go/commit/4808fc444307fa683bf3df6d55f9ad1828891a36

@mwhudson does this make any sense to you?

Reproduction steps:

  • grab a 386 build

  • build a shared libstd.go:
    go install -x -v -buildmode=shared -linkshared std

  • build snapd, use -linkshared:
    go install -x -v -linkshared github.com/snapcore/snapd/cmd/snapd

  • double check snapd is linked with libstd.so:

    ubuntu@ubuntu:~/go/src/github.com/snapcore/snapd$  ldd /home/ubuntu/go/bin/snapd |grep libstd.so
            libstd.so => /home/ubuntu/goroot/go/pkg/linux_386_dynlink/libstd.so (0xb648e000)
    
  • start snapd:
    sudo SNAPD_DEBUG=1 /home/ubuntu/go/bin/snapd

  • run snap install:
    sudo snap install hello

  • once the download (actual download, with progress bar and transfer speed) starts, hit ^C

I’ve used snapd commit 3a40b94

I’ve pushed a commit disabling GO_DYNLINK on x86 to https://github.com/morphis/meta-snappy/pull/15.

1 Like

No, not even slightly :slight_smile: but I can have a deeper look, thanks for doing the bisect.

@morphis
Could we set Yocto to “supported” and version “2.30” on this page?

I’ve opened a PR to update snapd to 2.31:

https://github.com/morphis/meta-snappy/pull/16

Once it’s merged I’ll update the docs page.

Docs PR:

https://github.com/canonical-docs/snappy-docs/pull/349

1 Like

Thanks for the update and merge!