NetworkManager doesn't start anymore on a fresh Ubuntu system after installing the LXD snap

Hey everyone,

I’ve installed a fresh Ubuntu 18.04 system today and one of the first things I did to setup my development environment was to install the lxd snap from latest (revision 8959). This now brings in the socket activation code for the LXD socket. On the next boot of the system I was wondering why I had no WiFi network and the reason was

Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found ordering cycle on dbus.service/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on basic.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on sockets.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on snap.lxd.daemon.unix.socket/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on network-online.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on network.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on NetworkManager.service/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Job dbus.service/stop deleted to break ordering cycle starting with NetworkManager.service/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found ordering cycle on basic.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on sockets.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on snap.lxd.daemon.unix.socket/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on network-online.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on network.target/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Found dependency on NetworkManager.service/stop
Okt 02 17:52:50 earth systemd[1]: NetworkManager.service: Job basic.target/stop deleted to break ordering cycle starting with NetworkManager.service/stop

Manually starting NetworkManager via systemctl start NetworkManager fixed the problem.

After talking with @stgraber he guessed that the network-online.target dependency introduces the cycle. I didn’t had time yet to reproduce this on another fresh system as I am just a step away from taking vacation.

This is

$ snap info core | grep installed
installed:   16-2.35.2                (5548) 92MB core

Leaving this here for someone from the snapd team to have a look as it seems something broke.

Thanks!

regards,
Simon

Yeah, looks like socket activated units generated by snapd effectively depend on network-online.target causing the loop.

That’s unfortunately out of the lxd’s snap control as snapd generates those units and since we’re tied to socket activation (due to being pre-installed in cloud images) if confirmed to affect all clean desktop installs, this will need rather quick fixing in snapd.

@mvo is that something you could look into?

The naive fix would be to drop the network-online.target reference in socket units, will be safe at least for unix sockets, for tcp/udp sockets, waiting for network would make sense, not sure how this is usually handled in systemd.

This hit me earlier today with a system that was already operational. Snap installs stopped working earlier today and upon reboot to clear the issue I was presented with zero network connectivity because Network Manager wasn’t started. I removed LXD and multipass (both snaps) because they were both in a stuck state also and then network manager could be restarted with systemctl restart. I didn’t do much diagnosis into what the problem was because I needed an operational system :slight_smile:

The fix has landed in this PR below, generated unit files depend on network.target now. It should be available in beta.

The downside is if you installed a snap using snapd 2.35, the unit files will not be regenerated when updating to 2.36.

Can you post the output of systemctl list-dependencies --before snap.lxd.lxd.service and the same for lxd socket unit file?

I have snap removed lxd for now, pending the fix. I can reinstall and post the list-dependencies but it won’t necessarily be indicative of the state my system was in when I hit the issue. @morphis is your system unmodified from when you hit this? If so can you’re probably the best placed to get the output @mborzecki asked for…

Something I noticed today when booting the machine:

systemd[1]: basic.target: Found ordering cycle on sockets.target/start
systemd[1]: basic.target: Found dependency on snap.lxd.daemon.unix.socket/start
systemd[1]: basic.target: Found dependency on network.target/start
systemd[1]: basic.target: Found dependency on NetworkManager.service/start
systemd[1]: basic.target: Found dependency on dbus.service/start
systemd[1]: basic.target: Found dependency on basic.target/start
systemd[1]: basic.target: Job sockets.target/start deleted to break ordering cycle starting with basic.target/start
kernel: EXT4-fs (dm-1): re-mounted. Opts: data=ordered

Dropped the dependency on network.target and systemd does not complain about dependency cycle anymore. The snap.lxd.daemon.unix.socket unit file looks like this now:

# /etc/systemd/system/snap.lxd.daemon.unix.socket
[Unit]
# Auto-generated, DO NO EDIT
Description=Socket unix for snap application lxd.daemon
Requires=var-lib-snapd-snap-lxd-8959.mount
Wants=sockets.target
After=var-lib-snapd-snap-lxd-8959.mount
X-Snappy=yes

[Socket]
Service=snap.lxd.daemon.service
FileDescriptorName=unix
ListenStream=/var/snap/lxd/common/lxd/unix.socket
SocketMode=0660

[Install]
WantedBy=sockets.target

Edit: this is what it looks like right now. The socket is active after booting, no dependency cycle reported. I’ll look into systemd to check how it chooses a unit to delete in order to break the ordering cycle.

I’ve opened a PR to drop the dependency of socket unit files on network.target. From testing this locally, no more cycles reported by systemd.