Snapcraft under LXD - Very slow Sudo - Hostname problem

I believe I have found a severe performance problem with how the LXD build environment’s hostname is configured when using SNAPCRAFT_BUILD_ENVIRONMENT=lxd. Although the /etc/hostname is configured correctly (matches the LXD container name), the uname() system call (and /proc/sys/kernel/hostname) both return the constant string “ubuntu” instead of the container name. As a result, any time sudo is used during the build (either by snapcraft itself, or by a script in snapcraft.yaml), there is a six-second delay while sudo’s attempt to resolve the hostname “ubuntu” times out.

Here is a trivial snapcraft.yaml that demonstrates the problem:

name: test-slow-sudo
summary: Quick demo of slow build problem caused by LXD hostname mis-config
description: |
  The host name returned by uname() syscall (and /proc/sys/kernel/hostname)
  is the constant string "ubuntu" which does not exist. This causes all
  invocations of "sudo" to incur a 6-second timeout penalty.
version: "1"
confinement: devmode
grade: devel
base: core20

parts:
  test:
    plugin: nil
    build-packages:
      - sudo
    override-build: |
      echo "Hostname according to uname(): $(hostname)"
      echo "Hostname according to /proc/sys/kernel/hostname: $(</proc/sys/kernel/hostname)"
      echo "Hostname according to /etc/hostname: $(</etc/hostname)"
      for ((i = 0; i < 4; i++)); do
        sudo true
        echo "Count: $i Elapsed: $SECONDS"
      done
      if (( $SECONDS > 5 )); then
        echo "FAILURE. The sudo command is very slow.  Elapsed: $SECONDS"
        exit 1
      else
        echo "SUCCESS. The sudo command is performing OK."
      fi

The problem only occurs if the snapcraft command had to create the LXD container (e.g. right after a snapcraft clean). However, if the container already exists, then the hostnames are all correct and the problem does not occur. Also, the problem does not occur when using mutlipass.

My host machine is Ubuntu 20.04, with these snap versions:

lxd         4.13      20309  latest/stable    canonical✓  -
snapcraft   4.7.1     6466   latest/stable/…  canonical✓  classic
snapd       2.49.2    11588  latest/stable    canonical✓  snapd

Here is a transcript:

$ snapcraft clean
$ snapcraft
[ ... snip ... ]
snapd is not logged in, snap install commands will use sudo
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
snap "core20" has no updates available
Pulling test 
+ snapcraftctl pull
Building test 
+ set +x
Hostname according to uname(): ubuntu
Hostname according to /proc/sys/kernel/hostname: ubuntu
Hostname according to /etc/hostname: snapcraft-test-slow-sudo
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
Count: 0 Elapsed: 6
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
Count: 1 Elapsed: 12
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
Count: 2 Elapsed: 18
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
Count: 3 Elapsed: 24
FAILURE. The sudo command is very slow.  Elapsed: 24
Failed to run 'override-build': Exit code was 1.

Notice the incorrect hostname reported by the kernel interface above.
Now run it again without destroying the container and get much better results:

$ snapcraft
Launching a container.
[ ... snip ... ]
snapd is not logged in, snap install commands will use sudo
snap "core20" has no updates available
Skipping pull test (already ran)
Building test 
+ set +x
Hostname according to uname(): snapcraft-test-slow-sudo
Hostname according to /proc/sys/kernel/hostname: snapcraft-test-slow-sudo
Hostname according to /etc/hostname: snapcraft-test-slow-sudo
Count: 0 Elapsed: 0
Count: 1 Elapsed: 0
Count: 2 Elapsed: 0
Count: 3 Elapsed: 0
SUCCESS. The sudo command is performing OK.
Staging test 
[ ... snip ... ]

Motivation: I cannot use multipass because nested virtualization is not allowed on AWS EC2, so need to use LXD. Also, when doing automated builds, the LXD container will not be preexisting.

Interesting. I’ve addressed this in some other work that has yet to be merged, but I’ve never seen a delay in the resolution.

Building test 
++ hostname
+ echo 'Hostname according to uname(): ubuntu'
Hostname according to uname(): ubuntu
+ echo 'Hostname according to /proc/sys/kernel/hostname: ubuntu'
Hostname according to /proc/sys/kernel/hostname: ubuntu
+ echo 'Hostname according to /etc/hostname: snapcraft-test-slow-sudo'
Hostname according to /etc/hostname: snapcraft-test-slow-sudo
+ (( i = 0 ))
+ (( i < 4 ))
+ sudo true
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
+ echo 'Count: 0 Elapsed: 0'
Count: 0 Elapsed: 0
+ (( i++ ))
+ (( i < 4 ))
+ sudo true
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
+ echo 'Count: 1 Elapsed: 0'
Count: 1 Elapsed: 0
+ (( i++ ))
+ (( i < 4 ))
+ sudo true
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
+ echo 'Count: 2 Elapsed: 0'
Count: 2 Elapsed: 0
+ (( i++ ))
+ (( i < 4 ))
+ sudo true
sudo: unable to resolve host ubuntu: Temporary failure in name resolution
+ echo 'Count: 3 Elapsed: 0'
Count: 3 Elapsed: 0
+ (( i++ ))
+ (( i < 4 ))
+ ((  0 > 5  ))
+ echo 'SUCCESS. The sudo command is performing OK.'
SUCCESS. The sudo command is performing OK.

I wonder if there is something different about your DNS configuration that results in the lengthy timeout. Can you run something like getent ahosts unresolvable-host-name (or use nslookup/dig/etc) and see how long it takes to resolve something that is not found?

For lulz I enabled LLMNR which should timeout after 1 second and did get a delay.

Here is a PR to hopefully fix it: https://github.com/snapcore/snapcraft/pull/3521

It should be available in an hour or so in the edge/pr-3251 channel if you can give it a test. :smiley:

small typo, it’s
edge/pr-3521

1 Like