Snapcraft fails using LXD on NVIDIA Jetson TX2

Yup, still not working.

Though it dawned on me that since I’m already down the path of recompiling my kernel, I might as well enable KVM and give Multipass a shot…but it would still be ideal if I could get builds within LXD containers working.

I don’t think multipass would work considering the user space logic it has.

I think that the LXD discourse would be a good place to bounce into to solve the LXD specifics.

But as soon as you get LXD working fine, there should be no issues with snapcraft --use-lxd or lxc launch ubuntu:18.04 foo && lxc exec foo -- snap install --classic snapcraft && lxc exec foo -- sh -c "cd my-source && snapcraft --destructive-mode" (this last command is illustrative, I did not try or expect that to be a one liner copy paste :smile:)

Thanks for the pointer @sergiusens – I had created a topic last week and haven’t heard anything yet. No worries; I’ll ping again.

Thanks for the help everyone – I’ll circle back once I get my LXD issues resolved.

Hey @ogra – while I’m working through the LXD issues on the LXD forum (linked above) I was hoping to get your take on the problems with building inside an LXD container – this is actually our ideal usage scenario for internal development as well as for our customers.

I’ve tried this out on L4T on my TX2 and Nano devkits, and also on my Nano using @abeato’s Jetson Ubuntu Core images (I built yesterday so as to have his latest LXD fixes included).

Regardless of the operating system, I struggle with the same things:

  • LXD containers don’t get IP addresses. This can be solved by running dhclient eth0 inside the container (but things like ping still don’t work, permission denied). Realistically I need to lxc config set <name> security.privileged true on the container to fix this.

  • The following workflow doesn’t work (for a container named ‘snapbuilder’):

    $ sudo lxc exec snapbuilder bash
    # apt update && apt upgrade
    # snap install snapcraft --classic
    2020-05-22T15:10:07Z ERROR cannot setup udev for snap "snapd": cannot reload       udev rules: exit status 2
    udev output:
    
    error: cannot perform the following tasks:
    - Setup snap "snapd" (7267) security profiles (cannot setup udev for snap "snapd":     cannot reload udev rules: exit status 2
    udev output:
    )
    - Setup snap "snapd" (7267) security profiles (cannot reload udev rules: exit status 2
    udev output:
    )
    

Since this issue follows all of the configurations, I’d guess I’m missing something in the kernel but I’m no expert on that topic.

Sorry, posted one google too soon:
https://bugs.launchpad.net/snapd/+bug/1712808

Looks like simply re-running the install tends to unblock, I now have snapcraft installed. But I’m back to the issue I mentioned a few posts above – the snapcraft command inside the LXD container seems to hang.

journalctl -u snapd shows:

May 22 13:56:00 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:00 test systemd[1]: Starting Snappy daemon...
May 22 13:56:01 test snapd[207]: AppArmor status: apparmor not enabled
May 22 13:56:01 test snapd[207]: daemon.go:346: started snapd/2.42.1+18.04 (series 16; classic; devmode) ubuntu/18.04 (arm64) linux/4.9.
May 22 13:56:01 test snapd[207]: daemon.go:439: adjusting startup timeout by 30s (pessimistic estimate of 30s plus 5s per snap)
May 22 13:56:01 test systemd[1]: Started Snappy daemon.
May 22 13:56:01 test snapd[207]: api.go:952: Installing snap "snapcraft" revision unset
May 22 13:56:06 test snapd[207]: daemon.go:540: gracefully waiting for running hooks
May 22 13:56:06 test snapd[207]: daemon.go:542: done waiting for running hooks
May 22 13:56:06 test snapd[207]: daemon stop requested to wait for socket activation
May 22 13:56:07 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:07 test systemd[1]: Starting Snappy daemon...
May 22 13:56:07 test snapd[338]: AppArmor status: apparmor not enabled
May 22 13:56:07 test snapd[338]: daemon.go:346: started snapd/2.42.1+18.04 (series 16; classic; devmode) ubuntu/18.04 (arm64) linux/4.9.
May 22 13:56:07 test snapd[338]: daemon.go:439: adjusting startup timeout by 30s (pessimistic estimate of 30s plus 5s per snap)
May 22 13:56:07 test systemd[1]: Started Snappy daemon.
May 22 13:56:07 test snapd[338]: api.go:952: Installing snap "snapcraft" revision unset
May 22 13:56:14 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:15 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:17 test snapd[338]: handlers.go:460: Reported install problem for "snapd" as 00116b7a-9c34-11ea-aa41-fa163ee63de6 OOPSID
May 22 14:01:22 test snapd[338]: daemon.go:540: gracefully waiting for running hooks
May 22 14:01:22 test snapd[338]: daemon.go:542: done waiting for running hooks
May 22 14:01:22 test snapd[338]: daemon stop requested to wait for socket activation

Googling that brought me here: https://github.com/lxc/lxd/issues/2004 – with some comments about ‘installing snaps inside lxd containers won’t work’ but that’s quite old.

Have been unsuccessful following all guidance involving manipulating the raw.lxc object in the configuration for my container (I guess trying to set to run unconfined).

And with enough persistence, it appears I have prevailed (EDIT: I haven’t…intermittent success at least)

I used the recommendations to set the raw.lxc parameters from this issue I referenced above. However lxc config edit test complained about formatting issues when adding in the raw.lxc section. Then I found this issue which describes how many of the raw.lxc parameters were either renamed or removed.

I came up with the following modifications to my config:

uskellse@uskellse-tx2:~$ lxc config show test
architecture: aarch64
config:
  image.architecture: arm64
  image.description: ubuntu 18.04 LTS arm64 (release) (20200519.1)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200519.1"
  image.type: squashfs
  image.version: "18.04"
  raw.lxc: |-
    lxc.apparmor.profile=unconfined
    lxc.cgroup.devices.allow=a
    lxc.mount.auto=proc:rw sys:ro cgroup:ro
    lxc.autodev=1
  security.privileged: "true"
  volatile.base_image: 134c9aa1abc870990921923735509b01ccbad69481d957b74d65e090511c9c9f
  volatile.eth0.host_name: veth7cc63258
  volatile.eth0.hwaddr: 00:16:3e:dd:5f:fb
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""

And (after restarting the container) I’m able to do snapcraft --destructive-mode from inside the LXD container and see it through to completion!

I’m not sure this really solves my LXD problems, but my objectives have been achieved. Thanks for your help/patience everyone.

1 Like

I’m going to keep updating my progress here in case anyone else stumbles on this issue later.

Not smooth sailing yet – it works sometimes, and other times when I run ‘snapcraft’ I get the hang (and it seems like the whole LXD container hangs?). I’ve gotten better results (so far at least) installing ‘snapcraft’ from apt inside the LXD container, instead of as a snap.

dont … really :slight_smile:
the apt packaged snapcraft is ancient and unmantained and will give you different binary results …

Haha well okay. My build ended up failing anyway using it…it seems like the common denominator here is: ‘have success -> post on forum -> stop having success’ :wink:

Yeah looks like spoke too soon. I got one successful build off and have been unable to get ‘snapcraft’ to run at all ever since. To be continued…

did you try to wiggle the cable (SCNR) ? :slight_smile:

1 Like

:smiley: I’ve done so much cable wiggling over the last couple weeks.

I’ll pick this topic (building snaps inside LXD itself) up in a day or two. Currently have the machine set up for remote access for the LXD team to debug core LXD issues.

1 Like

From https://discuss.linuxcontainers.org/t/linux-for-tegra-l4t-networking-issues/7775/20 the problem is custom 4.9.140-tegra kernel. Did you get that one from NVIDIA? :smiley:

FTR, I get the same hang when running snapcraft in lxd. Interestingly, when I run the command with snap run --strace ..., it does not block anymore - which also means I cannot find where it is blocking.

Yeah, it’s the Linux for Tegra (L4T) standard image. Networking problems seemingly can be worked around by running privileged (lxc config set security.privileged true) but I still hit the hang.

I have been finally able to run snapcraft on lxd in the Jetson. There were some android patches around that caused the problem. The commits that need to be reverted can be seen here:

I really have no idea why nvidia chose the android kernel path, but this is not the first time I see these problems with the android patches :frowning_face:. Maybe the sane thing to do would be to remove all the “ANDROID:” commits…

3 Likes

Awesome news and thanks for digging through it! I’ll be giving this a closer look tomorrow morning for sure.

FTR, removing a couple of additional android patches fixes the networking problems on unprivileged containers. However, you cannot install snaps in these unprivileged containers yet - some fuse patches from the Ubuntu kernel are missing, I think. So, still, you will have to use privileged containers to run snapcraft.

Thanks @abeato. I’ve been working on this today – had some difficulties with lxd init failing to setup the lxdbr0 interface after applying your last several patches. Right now I’m working on building a full L4T 32.1.0 kernel right now with all the patches from your jetson-kernel-snap repository.

Attempted to flash my devkit with L4T 32.1.0 with a custom kernel (applied the patches from @abeato’s project) in an attempt to get LXD/snapcraft working reliably. Unfortunately the kernel fails to boot and I haven’t dug deeper.

I really appreciate all the help to get this working, but at this point it feels like I’m past the point of diminishing returns. I’ll have to continue working between two devices, one running Ubuntu Core and the other running Linux for Tegra.