DNS resolution doesn't work on the first boot on core20

On the first boot (only), DNS requests from our snap fails. After the reboot, everything works just fine.

We’re using NetworkManager as a renderer, here are its logs of NetworkManager startup

I checked whether /etc/resolv.conf has correct content. It looks like being right. This is extracted from inside the snap service:

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.53
options edns0 trust-ad

DNS name resolution from inside the python service running in the snap looks fine:

[  896.413701] python3[5950]: urllib3.connectionpool Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0xb3b812f8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution')': /api/4504479311986688/store/

I tried to make a DBus call to resolve1, and it fails like this:

[  874.864453] python3[5950]: screenly.networkprobe Cannot resolve hostname with resolved: g-io-error-quark: GDBus.Error:org.freedesktop.resolve1.NoNameServers: No appropriate name servers or networks for name found (36)

My theory is that on the first boot systemd-networkd configures the network and when NetworkManager takes over the network configuration after netplan apply and DNS settings break.

As I expected, plugging the ethernet cable after NM takes the network doesn’t break DNS name resolution anymore. So, I expect that it’s some kind of interference between networkd and NetworkManager.

Hi @renat2017 I am a bit confused by the problem description, you say that it is working inside the snap, but it is not if you do it from a shell?

Then I’m not sure what you mean with plugging the ethernet cable, it is not plugged originally and you are using wifi in the beginning? And when you plug the ethernet cable the problme is gone?

Also, could you please share output of snap list?

Hi @abeato, that’s how it looks like. Perhaps, it doesn’t have DNS inside the shell either, but I cannot check it because there is no user in our device.

What I can tell though, that the device can connect to an NTP server, which means that it actually can resolve domain names.

I can set up a user once device is provisioned, but then the problem goes away, because it happens just after the first boot, once device is provisioned.

  1. I think that there is a conflict between networkd and network manager during the first boot. To check my theory I will try to systemd.mask systemd-networkd.service from the kernel command line.

  2. I also will give you the output of snap list, but a little bit later.

Hi @abeato. Here is the snap list output:

/ssh:pi@192.168.1.100: $ sudo snap list
Name                Version                          Rev    Tracking          Publisher       Notes
core20              20221212                         1781   latest/stable     canonical✓      base
core22              20230110                         488    latest/stable     canonical✓      base
network-manager     1.22.10-14                       744    20/stable         canonical✓      -
pi-kernel           5.4.0-1079.90                    577    20/stable         canonical✓      kernel
screenly-client     3.4.2-200c9c53-fix-init-network  1527   latest/candidate  screenly-brand  -
screenly-pi-gadget  1.0.5                            101    pi3/candidate     screenly-brand  gadget
snapd               2.58                             17952  latest/stable     canonical✓      snapd

As I expected - disabling systemd-networkd with the kernel command line fixes everything. Anyhow, I don’t like that solution in case that at some point in the future NetworkManager might be used as a frontend for networkd.

Ok, here are the facts I have now:

  1. Device provisioning with Ethernet cable plugged in causes DNS failure.
  2. After the reboot DNS works fine even if reboot was done with Ethernet.
  3. Device provisioning without Ethernet cable, but plugging it in after the provisioning finishes doesn’t break DNS.
  4. Disabling systemd-networkd.service with cmdline.txt (in our own gadget snap), and then provisioning with Etherned cable pluggied in doesn’t break DNS.

So I suggest that it’s some systemd-networkd vs NetworkManager conflict happening during the first boot.

Hi @renat2017, thanks for the detailed analysys. Something that bring my attention is that I see core20 and core22 bases installed, is this a UC20 or UC22 system? snap model --verbose | grep base: would give us the answer.

Note that if this is a UC22 system it is recommended that you install network-manager from the 22/stable channel, as it is needed to have it in sync with the base as both snaps need to interact for the networking configuration.

Hi @abeato, here is the output of the command

(⏳|screenly) /ssh:pi@192.168.1.103: $ snap model --verbose | grep base
base:          core20
    type:             base
    type:             base

We didn’t move to the core22 yet.

This probably solves the issue: https://github.com/snapcore/network-manager-snap/pull/19 You can download the artifacts from https://github.com/snapcore/network-manager-snap/actions/runs/4514211634 if you with to test the change.