Snaps do not start in PXE booted environment, NFS mounted / (Ubuntu 20.04)

Dear community,

my colleagues and I are currently working on a boot image for a PXE environment based on Ubuntu 20.04, which will also contain (classic) Snaps of various programs like Pycharm and IntelliJ IDEA.

The PXE image is built by applying various Ansible playbooks to a running VM. After everything is setup correctly, the files are transferred from the VM to a directory on a PXE server via rsync (see below). The image is then booted from there and, to prevent users from modifying the system, mounted as read-only via overlayroot. Changes are written to the temporary read-write filesystem provided by overlayroot.

Everything works as expected. However, the Snaps won’t start via button in the Dock and fail to snap run with the following error message:

/snap/snapd/17883/usr/lib/snapd/snap-confine: error while loading shared libraries: libudev.so.1: cannot open shared object file: No such file or directory

When we launch an app via its start script, e.g. Pycharm with /snap/pycharm-professional/314/bin/pycharm.sh, it starts without problems.

On the source VM the Snaps always start, regardless of whether you click the button in the Dock, use snap run or start them via their shell scripts.

At first we thought it might be an issue with the way the files are transferred. As we learned from here, there is no need to include the whole /snap directory. When we then omitted the directory, Snaps wouldn’t start because the ‘current’ symlinks were missing. Therefore we altered the deployment playbook to only exclude directories below /snap/app-name/, so that the dynamically mounted revision directories would be omitted : - "--exclude=/snap/*/*/" . See below for the whole rsync command. This works as expected and a diff of find /snap between the running source VM and a PXE booted machine shows no difference. Still, the Snaps won’t start because of the libudev error.

Ansible Playbook for deployment (rsync)
- name: Synchronize image files to PXE server
  ansible.posix.synchronize:
    src: /
    dest: "rsync://{{ deploy_host }}/{{ deploy_share }}"
    archive: yes
    delete: yes
    rsync_opts:
      - "-AXH"
      - "--delete-excluded"
      - "--exclude=/dev/*"
      - "--exclude=/proc/*"
      - "--exclude=/sys/*"
      - "--exclude=/tmp/*"
      - "--exclude=/var/tmp/*"
      - "--exclude=/var/log/*"
      - "--exclude=/run/*"
      - "--exclude=/mnt/*"
      - "--exclude=/media/*"
      - "--exclude=/lost+found"
      - "--exclude=/snap/*/*/"
      - "--exclude=/swapfile"
      - "--exclude=/tftpboot"
      - "--exclude=/srv/tftp"
      - "--exclude=/etc/fstab"  # fstab not needed for PXE boot
      - "--exclude=/etc/systemd/system/multi-user.target.wants/ssh.service" # Disable SSH
  delegate_to: "{{ inventory_hostname }}"

Since we only exclude temporary, dynamic directories the two systems should behave the same. Any idea why we get this error?

Both systems show no errors in the journal of snapd. The various snap commands like list, info, services, connections etc. show no difference. The ouput of mount does not differ either.

Maybe the problem lies in the different storage backends and Apparmor. When booting via PXE the files are read over a NFS connection. /var/log/syslog contains Apparmor policy violations mentioning snapd-confine and the NFS server’s IP:

Dec 14 12:14:12 hostname kernel: [72140.139010] audit: type=1400 audit(1671016452.169:181): apparmor=“DENIED” operation=“sendmsg” profile="/snap/snapd/17883/usr/lib/snapd/snap-confine" pid=18420 comm=“snap-confine” laddr=client_ip lport=757 faddr=nfs_server_ip fport=2049 family=“inet” sock_type=“stream” protocol=6 requested_mask=“send” denied_mask=“send”

Any idea how to safely allow the needed operations in Apparmor (/etc/apparmor.d/usr.lib.snapd.snap-confine.real ?) without breaking snapd?

Adding network inet, network inet6 like mentioned here: Snaps and NFS /home didn’t work