Yocto Rocko - "core" snap panic


#21

The PR just got merged, so at least 2.30 should work out of the box.

Moving forward, I hope to be able to keep the layer up to date with releases.


#22

I built the layer on two different platforms. My laptop and a build server. On the laptop everything is fine, on the server, this error persists:

root@qemux86:~# snap install hello-world
error: cannot perform the following tasks:
- Mount snap "core" (3602) (exit status 127)

I found the reason for it. unsquashfs was not finding liblz4, because it was built in a subdirectory of /usr/lib. Adding the path to snapd’s environment would be a dirty fix, but fourtunately, there was already a patch in Yocto master, which fixes the problem. I just backported it to Yocto rocko and it got accepted in rocko-next.

The second issue, the null pointer dereference is still occuring, even though snap works fine. journalctl -xe shows the same issue, as in my first post. It occurs at an atomic read. If snap is executed on qemux86-64, no null pointer dereference occurs and everything works fine. Maybe, on 32-bit qemu, Go’s atomic is not working and the resource is held by somebody else.


#23

Thanks for sending the fixes for Rocko.

As for the backtrace I do see it in the journal. II’ll investigate a bit more.


#24

Indeed the problem is observable for binaries built under Yocto, with Go 1.9, GOARCH=386, GO386=387, CGO_ENABLED=1.
I could not reproduce the problem when building with my host Go (either 1.9 or 1.9.3) with cross compilation flags set. Funnily enough, even when I build using the toolchain that Yocto built.

So far I have found only paths that seem to fix/mask the issue. First one is disabling Go optmizations in snapd recipe:

GOBUILDFLAGS_append = " -gcflags '-N'"

The second one is to build snapd daemon statically. It’s enough to list it under STATIC_GO_INSTALL in the recipe:

STATIC_GO_INSTALL = " \
	${GO_IMPORT}/cmd/snapd		\
	${GO_IMPORT}/cmd/snap-exec		\
	${GO_IMPORT}/cmd/snap-update-ns		\
"

The current recipe will also ignore ${STATIC_GO_INSTALL} and uses hardocded list of binaries. I’ve fixed it here: https://github.com/morphis/meta-snappy/pull/15

On a side note, I had really hard time debugging Go binaries built under Yocto. It seems like DWARF produced by Go compiler does not play well with gdb and I ended up getting

Cannot find DIE at 0x0 referenced from DIE at 0x10c [in module debugfs/usr/lib/go/pkg/linux_386_dynlink/libstd.so]

For comparison the same method works just fine for some random C binaries.

Edit: bumped Go version in Yocto to 1.9.3, same effect.

Edit2:

Reproduction steps:

  • run this:
    SNAPD_DEBUG=1 /usr/lib/snapd/snapd
    
  • In separate shell:
    snap install hello-world
    
  • Press ^C when the download (actual download) starts

Breadcrumbs diff: https://paste.ubuntu.com/26482348/
Bog when with the backtrace fails: https://paste.ubuntu.com/26482304/

Note this:

2018/01/29 07:16:11.640734 task.go:248: -- in task 0x967d8780 set progress, state 0x9674d640
2018/01/29 07:16:11.641011 store.go:1616: -- finished, err: context canceled
2018/01/29 07:16:11.641340 progress.go:71: progress adapter--- &{task:0x967d8780 unlocked:true label:core total:8.0797696e+07 current:2.784677e+06}
2018/01/29 07:16:11.641850 task.go:248: -- in task 0xb77006fc set progress, state 0xe3f 
!!!!!---                              task pointer ^^^ changed from the last log
!!!!! now it's 0xb77006fc (clearly bogus), before 0x967d8780
!!!!! 

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xe6b pc=0x8275c5a]

goroutine 40 [running]:
github.com/snapcore/snapd/overlord/state.(*State).writing(0xe3f)
	/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/state.go:140 +0x1a
github.com/snapcore/snapd/overlord/state.(*Task).SetProgress(0xb77006fc, 0x967ee700, 0x4, 0x4d0e000, 0x4d0e000)
	/mnt/data/maciek/work/canonical/yocto/rocko-snapd/tmp/work/i586-poky-linux/snapd/2.30-r0/snapd-2.30/src/github.com/snapcore/snapd/overlord/state/task.go:252 +0x1e9

Edit3
So far I’m naming -linkshared link flag as the prime suspect. Investigating further:

b6442000-b7102000 r-xp 00000000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so           
b7102000-b7103000 ---p 00cc0000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7103000-b770f000 r--p 00cc0000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b770f000-b7753000 rw-p 012cc000 fd:00 1371       /usr/lib/go/pkg/linux_386_dynlink/libstd.so
b7753000-b7771000 rw-p 00000000 00:00 0                                 
b7771000-b7774000 r--p 00000000 00:00 0          [vvar]                 
b7774000-b7776000 r-xp 00000000 00:00 0          [vdso]                 
bfa1b000-bfa3c000 rw-p 00000000 00:00 0          [stack]             

The bogus address ends up being located in the rw-p section of libstd.so. This is a shared runtime library only enabled when building with -linkshared.

Moving the snapd and libstd.so binaries to Ubuntu Artful i386 image I have observed the same segfault.

Adding -linkshared is controlled by GO_DYNLINK variable. Unfortunately it’s set through machine overrides in goarch.bbclass:

GO_DYNLINK = ""
GO_DYNLINK_arm = "1"
GO_DYNLINK_aarch64 = "1"
GO_DYNLINK_x86 = "1"
GO_DYNLINK_x86-64 = "1"
GO_DYNLINK_powerpc64 = "1"
GO_DYNLINK_class-native = ""

Forcefully disabling it for x86 seems to do the trick. No more segfaults. Adding this to snapd recipe file will disable -linkshared for x86.

GO_DYNLINK_x86_remove = "1"

@PSGXerus can you try the above on your setup?


#25

Good work.
GO_DYNLINK_x86_remove = "1" solves the issue.
I also used your newest branch from github.

I’m curious what the problem with libstd.so is though.

After all, thanks for your help. I think with this fix, the meta-snappy
layer is ready to go again.


#26

There is some bug or other in PIC generation on 386. I don’t have a non-enormous test case though.


#27

I’ve managed to reproduce the problem with snapd built from source in a Xenial cloud image. Iterating through a couple of Go versions, 1.9 is the first release that introduce the breakage. 1.8.6 is the last one that works.


#28

I’ve bisected the range go1.8.6 to go1.9. The first bad commit is:

@mwhudson does this make any sense to you?


#29

Reproduction steps:

  • grab a 386 build

  • build a shared libstd.go:
    go install -x -v -buildmode=shared -linkshared std

  • build snapd, use -linkshared:
    go install -x -v -linkshared github.com/snapcore/snapd/cmd/snapd

  • double check snapd is linked with libstd.so:

    ubuntu@ubuntu:~/go/src/github.com/snapcore/snapd$  ldd /home/ubuntu/go/bin/snapd |grep libstd.so
            libstd.so => /home/ubuntu/goroot/go/pkg/linux_386_dynlink/libstd.so (0xb648e000)
    
  • start snapd:
    sudo SNAPD_DEBUG=1 /home/ubuntu/go/bin/snapd

  • run snap install:
    sudo snap install hello

  • once the download (actual download, with progress bar and transfer speed) starts, hit ^C

I’ve used snapd commit 3a40b94


#30

I’ve pushed a commit disabling GO_DYNLINK on x86 to https://github.com/morphis/meta-snappy/pull/15.


#31

No, not even slightly :slight_smile: but I can have a deeper look, thanks for doing the bisect.


#32

@morphis
Could we set Yocto to “supported” and version “2.30” on this page?


#33

I’ve opened a PR to update snapd to 2.31:

Once it’s merged I’ll update the docs page.

Docs PR:


#34

Thanks for the update and merge!