Issues when porting image from Core 16 to Core 18

Hi, we finally got to porting our image to core 18. And we have run into a couple of issues - I updated the assertion with base18, new gadget snap, and pi-kernel. (The target device is Pi 3).

  1. So the first issue is that our snap can’t start because of this error:

Sep 19 20:31:04 localhost screenly-client.viewer[1792]: * failed to open vchiq instance

I remember having this error in the early 16 days, we initially had to run the snap in devmode and then the fix came out. OpenGL interface was added I think?
Not sure, but I did check that interface is connected and it certainly is. Below is a part of snap interfaces output.
:opengl screenly-client

So no idea what changed. The one thing I had to do for new gadget snap (we had old pi3 gadget snap used + our changes) - was building with sudo. This is the command used to actually build the gadget snap:
sudo snapcraft snap --target-arch=armhf --destructive-mode
I did try building it without sudo but it fails:

Building gadget 
Cross compilation detected; using pre-defined sources list
make: Entering directory '/home/sergey/work/pi-gadget'
mkdir -p "/home/sergey/work/pi-gadget/stage"/apt
cp "./helpers/sources.list.cross" ""/home/sergey/work/pi-gadget/stage"/apt/multiverse.sources.list"
sed -i "/^deb/ s/\bfocal/bionic/" ""/home/sergey/work/pi-gadget/stage"/apt/multiverse.sources.list"
sed -i "/^deb/ s/$/ multiverse/" ""/home/sergey/work/pi-gadget/stage"/apt/multiverse.sources.list"
apt-get update \
	-o Dir::Etc::sourcelist=""/home/sergey/work/pi-gadget/stage"/apt/multiverse.sources.list" \
	-o APT::Architecture=armhf 2>/dev/null
Reading package lists... Done
make: *** [Makefile:70: multiverse] Error 100
make: Leaving directory '/home/sergey/work/pi-gadget'
Failed to run 'override-build': Exit code was 2.

Any ideas what can be causing this issue?

  1. And the second issue is as important - when running on Pi 3B+ there is no ethernet connection.
root@localhost:/# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether b8:27:eb:0c:d1:a1 brd ff:ff:ff:ff:ff:ff
root@localhost:/# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: wlan0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN mode DORMANT group default qlen 1000
    link/ether b8:27:eb:0c:d1:a1 brd ff:ff:ff:ff:ff:ff

When image is starting up there is a message about netplan failing. Not sure if relevant.
In our old gadget snap we had this code:

  RmBXKl6HO6YOC2DE4G2q1JzWImC04EUy:
    ethernet.enable: true

I moved it to the pi-gadget as well but it does not seem to have any effect. Did something change here? DO we need to change this stuff to get it working?

This looks reminiscent of LP: #1533265 which, despite the status on that bug, does appear to have been fixed at some point. At least, /dev/vchiq and all the relevant /dev/dri/* stuff appears to be mentioned in interfaces/builtin/opengl.go.

However, looking at the devices themselves in both core16 and core18, root access would still be required (the permissions on the devices are still root:root 0600). However, that’s not changed between core16 and core18 so I’m doubtful it’s the issue here.

The only major change I can see of possible relevance between core16 and core18 (at least at the gadget-snap level) is that core16 was using the KMS overlay (dtoverlay=vc4-kms-v3d) while core18 uses the FKMS overlay (dtoverlay=vc4-fkms-v3d). Both provide KMS functionality, but the latter is currently required for full compatibility with the camera module. It may be worth trying the KMS overlay instead by editing config.txt on the boot partition (replace the dtoverlay= line mentioned above) to see if that makes any difference?

I’m guessing here, but that section relies on a modified version of the host’s /etc/apt/sources.list file. Is this perhaps empty on the build machine? (e.g. if the sources are all defined in includes under /etc/apt/sources.list.d)

Ethernet should “just work” on the 3B+ (at least it does out of the box on a fresh core18 install). I would suggest trying it without the ethernet.enable line as there’s nothing similar in the base gadget snap.

Hope that helps,

Dave.

The only major change I can see of possible relevance between core16 and core18 (at least at the gadget-snap level) is that core16 was using the KMS overlay ( dtoverlay=vc4-kms-v3d ) while core18 uses the FKMS overlay ( dtoverlay=vc4-fkms-v3d ). Both provide KMS functionality, but the latter is currently required for full compatibility with the camera module. It may be worth trying the KMS overlay instead by editing config.txt on the boot partition (replace the dtoverlay= line mentioned above) to see if that makes any difference?

In the 16 we had dtoverlay commented out completely. I did try the dtoverlay as it is and commented it out. With it set to the default value our eglfs app refused to start.
Just to make sure I did try a) removing dtoverlay line completely b) option from 16 c) default option.
But the same result in the end.

Ethernet should “just work” on the 3B+ (at least it does out of the box on a fresh core18 install). I would suggest trying it without the ethernet.enable line as there’s nothing similar in the base gadget snap.

So that ethernet.enable line was just setting that option for the network-manager. The result is the same with it or without. There is no ethernet and the green LED is never lightning up. I did try it on a couple of 3B+ just to make sure it was not a specific device issue.

Hi again. This does not seem to be an issue on our side. To make sure it’s not our changes I did the following:

  1. ❯ UBUNTU_STORE_ARCH=armhf snap download pi --channel=18-pi3/stable - downloaded a slightly older gadget snap. Built an image with it with 0 other changes from our side. Immediately ethernet connection was working. After increasing gpu_mem and changing dt_overlay I was able to get eglfs app working.

  2. Downloaded the latest pi gadget snap from the 18-pi channel. Built an image. Got the same result I had on the clean build of 18-armhf branch of pi-gadget - no ethernet light on, unable to run our Qt app.

And did more testing. Seems like it’s another pattern there. To make sure I account for all variables, I tested different gadget snaps while sideloading them vs letting ubuntu-image download it.

When I sideload the gadget snap ethernet works. My app works as well.

ping @waveform / @ogra