Snapcraft fails using LXD on NVIDIA Jetson TX2

Hey @ogra I’ve rebuilt the kernel with those features enabled and I still hit the same error (unble to resolve ports.ubuntu.com) using snapcraft --use-lxd outside the container.

Here’s my latest lxd.check-kernel output:

--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
newuidmap is not installed
newgidmap is not installed
Network namespace: enabled

--- Control groups ---
Cgroups: enabled

Cgroup v1 mount points:
/sys/fs/cgroup/systemd
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/blkio
/sys/fs/cgroup/devices
/sys/fs/cgroup/memory
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/freezer
/sys/fs/cgroup/hugetlb
/sys/fs/cgroup/pids
/sys/fs/cgroup/debug
/sys/fs/cgroup/perf_event

Cgroup v2 mount points:
/sys/fs/cgroup/unified

Cgroup v1 clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Bridges: enabled, not loaded
Advanced netfilter: enabled, not loaded
CONFIG_NF_NAT_IPV4: enabled, loaded
CONFIG_NF_NAT_IPV6: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded
FUSE (for use with lxcfs): enabled, loaded

--- Checkpoint/Restore ---
checkpoint restore: missing
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: missing
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities:

Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /snap/lxd/14958/bin/lxc-checkconfig

Oh one more thing – I blew away all my lxd installation(s) in order to re-try the building inside a container via snapcraft --use-destructive. I noticed that the first time I run snap install snapcraft --classic I get:

root@test:~# snap install snapcraft --classic
error: cannot perform the following tasks:
- Setup snap "snapd" (7267) security profiles (cannot setup udev for snap "snapd": cannot reload udev rules: exit status 2
udev output:
)
- Setup snap "snapd" (7267) security profiles (cannot reload udev rules: exit status 2
udev output:
)

If I run snap install snapcraft --classic again, it appears to succeed! But the snapcraft --destructive-mode command again appears to hang. It doesn’t look like snapd is running (per the error in the first installation attempt).

modprobe bridge ?

bridge was included as Y, not as a module, so I think the tool was wrong. For giggles, I recompiled bridge as a module (and now I have):

--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: enabled
newuidmap is not installed
newgidmap is not installed
Network namespace: enabled

--- Control groups ---
Cgroups: enabled

Cgroup v1 mount points:
/sys/fs/cgroup/systemd
/sys/fs/cgroup/freezer
/sys/fs/cgroup/hugetlb
/sys/fs/cgroup/cpuset
/sys/fs/cgroup/perf_event
/sys/fs/cgroup/devices
/sys/fs/cgroup/pids
/sys/fs/cgroup/cpu,cpuacct
/sys/fs/cgroup/debug
/sys/fs/cgroup/blkio
/sys/fs/cgroup/net_cls,net_prio
/sys/fs/cgroup/memory

Cgroup v2 mount points:
/sys/fs/cgroup/unified

Cgroup v1 clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: enabled
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled, loaded
Macvlan: enabled, not loaded
Vlan: enabled, not loaded
Bridges: enabled, loaded
Advanced netfilter: enabled, not loaded
CONFIG_NF_NAT_IPV4: enabled, loaded
CONFIG_NF_NAT_IPV6: enabled, loaded
CONFIG_IP_NF_TARGET_MASQUERADE: enabled, loaded
CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, loaded
CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, loaded
CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded
FUSE (for use with lxcfs): enabled, loaded

--- Checkpoint/Restore ---
checkpoint restore: missing
CONFIG_FHANDLE: enabled
CONFIG_EVENTFD: enabled
CONFIG_EPOLL: enabled
CONFIG_UNIX_DIAG: enabled
CONFIG_INET_DIAG: enabled
CONFIG_PACKET_DIAG: enabled
CONFIG_NETLINK_DIAG: enabled
File capabilities:

Note : Before booting a new kernel, you can check its configuration
usage : CONFIG=/path/to/config /snap/lxd/14958/bin/lxc-checkconfig

the only thing i see differing now compared to a normal ubuntu desktop install is

checkpoint restore: enabled

but i cant really imagine this to be critical …
i assume nothing changed after bridge became a module ?

Yup, still not working.

Though it dawned on me that since I’m already down the path of recompiling my kernel, I might as well enable KVM and give Multipass a shot…but it would still be ideal if I could get builds within LXD containers working.

I don’t think multipass would work considering the user space logic it has.

I think that the LXD discourse would be a good place to bounce into to solve the LXD specifics.

But as soon as you get LXD working fine, there should be no issues with snapcraft --use-lxd or lxc launch ubuntu:18.04 foo && lxc exec foo -- snap install --classic snapcraft && lxc exec foo -- sh -c "cd my-source && snapcraft --destructive-mode" (this last command is illustrative, I did not try or expect that to be a one liner copy paste :smile:)

Thanks for the pointer @sergiusens – I had created a topic last week and haven’t heard anything yet. No worries; I’ll ping again.

Thanks for the help everyone – I’ll circle back once I get my LXD issues resolved.

Hey @ogra – while I’m working through the LXD issues on the LXD forum (linked above) I was hoping to get your take on the problems with building inside an LXD container – this is actually our ideal usage scenario for internal development as well as for our customers.

I’ve tried this out on L4T on my TX2 and Nano devkits, and also on my Nano using @abeato’s Jetson Ubuntu Core images (I built yesterday so as to have his latest LXD fixes included).

Regardless of the operating system, I struggle with the same things:

  • LXD containers don’t get IP addresses. This can be solved by running dhclient eth0 inside the container (but things like ping still don’t work, permission denied). Realistically I need to lxc config set <name> security.privileged true on the container to fix this.

  • The following workflow doesn’t work (for a container named ‘snapbuilder’):

    $ sudo lxc exec snapbuilder bash
    # apt update && apt upgrade
    # snap install snapcraft --classic
    2020-05-22T15:10:07Z ERROR cannot setup udev for snap "snapd": cannot reload       udev rules: exit status 2
    udev output:
    
    error: cannot perform the following tasks:
    - Setup snap "snapd" (7267) security profiles (cannot setup udev for snap "snapd":     cannot reload udev rules: exit status 2
    udev output:
    )
    - Setup snap "snapd" (7267) security profiles (cannot reload udev rules: exit status 2
    udev output:
    )
    

Since this issue follows all of the configurations, I’d guess I’m missing something in the kernel but I’m no expert on that topic.

Sorry, posted one google too soon:
https://bugs.launchpad.net/snapd/+bug/1712808

Looks like simply re-running the install tends to unblock, I now have snapcraft installed. But I’m back to the issue I mentioned a few posts above – the snapcraft command inside the LXD container seems to hang.

journalctl -u snapd shows:

May 22 13:56:00 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:00 test systemd[1]: Starting Snappy daemon...
May 22 13:56:01 test snapd[207]: AppArmor status: apparmor not enabled
May 22 13:56:01 test snapd[207]: daemon.go:346: started snapd/2.42.1+18.04 (series 16; classic; devmode) ubuntu/18.04 (arm64) linux/4.9.
May 22 13:56:01 test snapd[207]: daemon.go:439: adjusting startup timeout by 30s (pessimistic estimate of 30s plus 5s per snap)
May 22 13:56:01 test systemd[1]: Started Snappy daemon.
May 22 13:56:01 test snapd[207]: api.go:952: Installing snap "snapcraft" revision unset
May 22 13:56:06 test snapd[207]: daemon.go:540: gracefully waiting for running hooks
May 22 13:56:06 test snapd[207]: daemon.go:542: done waiting for running hooks
May 22 13:56:06 test snapd[207]: daemon stop requested to wait for socket activation
May 22 13:56:07 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:07 test systemd[1]: Starting Snappy daemon...
May 22 13:56:07 test snapd[338]: AppArmor status: apparmor not enabled
May 22 13:56:07 test snapd[338]: daemon.go:346: started snapd/2.42.1+18.04 (series 16; classic; devmode) ubuntu/18.04 (arm64) linux/4.9.
May 22 13:56:07 test snapd[338]: daemon.go:439: adjusting startup timeout by 30s (pessimistic estimate of 30s plus 5s per snap)
May 22 13:56:07 test systemd[1]: Started Snappy daemon.
May 22 13:56:07 test snapd[338]: api.go:952: Installing snap "snapcraft" revision unset
May 22 13:56:14 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:15 test systemd[1]: snapd.service: Failed to reset devices.list: Operation not permitted
May 22 13:56:17 test snapd[338]: handlers.go:460: Reported install problem for "snapd" as 00116b7a-9c34-11ea-aa41-fa163ee63de6 OOPSID
May 22 14:01:22 test snapd[338]: daemon.go:540: gracefully waiting for running hooks
May 22 14:01:22 test snapd[338]: daemon.go:542: done waiting for running hooks
May 22 14:01:22 test snapd[338]: daemon stop requested to wait for socket activation

Googling that brought me here: https://github.com/lxc/lxd/issues/2004 – with some comments about ‘installing snaps inside lxd containers won’t work’ but that’s quite old.

Have been unsuccessful following all guidance involving manipulating the raw.lxc object in the configuration for my container (I guess trying to set to run unconfined).

And with enough persistence, it appears I have prevailed (EDIT: I haven’t…intermittent success at least)

I used the recommendations to set the raw.lxc parameters from this issue I referenced above. However lxc config edit test complained about formatting issues when adding in the raw.lxc section. Then I found this issue which describes how many of the raw.lxc parameters were either renamed or removed.

I came up with the following modifications to my config:

uskellse@uskellse-tx2:~$ lxc config show test
architecture: aarch64
config:
  image.architecture: arm64
  image.description: ubuntu 18.04 LTS arm64 (release) (20200519.1)
  image.label: release
  image.os: ubuntu
  image.release: bionic
  image.serial: "20200519.1"
  image.type: squashfs
  image.version: "18.04"
  raw.lxc: |-
    lxc.apparmor.profile=unconfined
    lxc.cgroup.devices.allow=a
    lxc.mount.auto=proc:rw sys:ro cgroup:ro
    lxc.autodev=1
  security.privileged: "true"
  volatile.base_image: 134c9aa1abc870990921923735509b01ccbad69481d957b74d65e090511c9c9f
  volatile.eth0.host_name: veth7cc63258
  volatile.eth0.hwaddr: 00:16:3e:dd:5f:fb
  volatile.idmap.base: "0"
  volatile.idmap.current: '[]'
  volatile.idmap.next: '[]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
devices: {}
ephemeral: false
profiles:
- default
stateful: false
description: ""

And (after restarting the container) I’m able to do snapcraft --destructive-mode from inside the LXD container and see it through to completion!

I’m not sure this really solves my LXD problems, but my objectives have been achieved. Thanks for your help/patience everyone.

1 Like

I’m going to keep updating my progress here in case anyone else stumbles on this issue later.

Not smooth sailing yet – it works sometimes, and other times when I run ‘snapcraft’ I get the hang (and it seems like the whole LXD container hangs?). I’ve gotten better results (so far at least) installing ‘snapcraft’ from apt inside the LXD container, instead of as a snap.

dont … really :slight_smile:
the apt packaged snapcraft is ancient and unmantained and will give you different binary results …

Haha well okay. My build ended up failing anyway using it…it seems like the common denominator here is: ‘have success -> post on forum -> stop having success’ :wink:

Yeah looks like spoke too soon. I got one successful build off and have been unable to get ‘snapcraft’ to run at all ever since. To be continued…

did you try to wiggle the cable (SCNR) ? :slight_smile:

1 Like

:smiley: I’ve done so much cable wiggling over the last couple weeks.

I’ll pick this topic (building snaps inside LXD itself) up in a day or two. Currently have the machine set up for remote access for the LXD team to debug core LXD issues.

1 Like

From https://discuss.linuxcontainers.org/t/linux-for-tegra-l4t-networking-issues/7775/20 the problem is custom 4.9.140-tegra kernel. Did you get that one from NVIDIA? :smiley:

FTR, I get the same hang when running snapcraft in lxd. Interestingly, when I run the command with snap run --strace ..., it does not block anymore - which also means I cannot find where it is blocking.

Yeah, it’s the Linux for Tegra (L4T) standard image. Networking problems seemingly can be worked around by running privileged (lxc config set security.privileged true) but I still hit the hang.