Cleanup of /tmp/snap-private-tmp/snap.lxd removes ability to connect to running containers

Hello!

I have a cluster (4 identical machines) running as an lxd cluster. Everything works fine for a day or so and then lxc is unable to connect to the containers. Restarting the containers resolves the issue for another day or so until it happens again. When lxc is unable to connect the containers are still running (I can ssh to them.)

Machine configurations * 4: Dell PowerEdge R6525 AMD EPYC 7282 16-Core Processor 128GB RAM Ubuntu 22.04 LTS lxd/lxc (snap) 5.0.1

Sample output: cmd@cluster01:~$ lxc list ±-------------±--------±--------------------±-----±----------±----------±----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | LOCATION | ±-------------±--------±--------------------±-----±----------±----------±----------+ | ubuntu-test | RUNNING | 240.81.0.157 (eth0) | | CONTAINER | 0 | cluster01 | ±-------------±--------±--------------------±-----±----------±----------±----------+ | ubuntu-test2 | RUNNING | 240.82.0.189 (eth0) | | CONTAINER | 0 | cluster02 | ±-------------±--------±--------------------±-----±----------±----------±----------+ | ubuntu-test3 | RUNNING | 240.83.0.64 (eth0) | | CONTAINER | 0 | cluster03 | ±-------------±--------±--------------------±-----±----------±----------±----------+ | ubuntu-test4 | RUNNING | 240.84.0.229 (eth0) | | CONTAINER | 0 | cluster04 | ±-------------±--------±--------------------±-----±----------±----------±----------+

cmd@cluster01:~$ lxc shell ubuntu-test Error: Failed to retrieve PID of executing child process

cmd@cluster01:~$ lxc console ubuntu-test To detach from the console, press: +a q Error: Error opening config file: “loading config file for the container failed” Error: write /dev/pts/ptmx: file already closed

cmd@cluster01:~$ ssh 240.81.0.157 Last login: Tue Jan 3 21:58:07 2023 from 240.81.0.1 To run a command as administrator (user “root”), use "sudo ". See “man sudo_root” for details.

cmd@ubuntu-test:~$ logout Connection to 240.81.0.157 closed.

cmd@cluster01:~$ lxc restart ubuntu-test

cmd@cluster01:~$ lxc shell ubuntu-test

root@ubuntu-test:~# logout

After getting help[0] it looks like /tmp/snap-private-tmp gets cleaned up automatically[1] and that’s how we ended up here.

Workaround: At the top of /usr/lib/tmpfiles.d/snapd.conf I added: x /tmp/snap-private-tmp/snap.lxd

This excludes the snap.lxd subdir from being “cleaned” which in turn breaks lxc’s ability to connect to containers.

So it looks like the default snap config should exclude snap.lxd from being removed from /tmp/snap-private-tmp/.

References: [0] https://github.com/lxc/lxd/issues/10771#issuecomment-1212183389 [1] https://discuss.linuxcontainers.org/t/lxc-unable-to-connect-to-running-container/16123

you should better use /etc/tmpfiles.d, the file in /usr/lib will likely be replaced blindly on package upgrades …

Can the default behaviour of Ubuntu be changed so that /tmp/snap-private-tmp/snap.lxd isn’t cleaned?

Someone should surely file a bug so this can be researched… All I meant to say above was that editing files that come from a Deb package is a bad idea since they will just be replaced on package updates…

I’ve filed an internal bug SNAPDENG-24758 that relates to this issue.

1 Like