IP renewal fails because of snapd watchdog

On one of my Linux containers (both the host system as well as the guest system are Ubuntu 20.04), snap is running. The list of installed snaps looks like this:

    root@c1:~# snap list
    Name     Version       Rev    Tracking       Publisher     Notes
    certbot  2.6.0         3024   latest/stable  certbot-eff✓  classic
    core     16-2.59.5     15511  latest/stable  canonical✓    core
    core20   20230622      1974   latest/stable  canonical✓    base
    core22   20230725      858    latest/stable  canonical✓    base
    lxd      5.15-002fa0f  25112  latest/stable  canonical✓    -
    snapd    2.59.5        19457  latest/stable  canonical✓    snapd

Basically, it’s the default snap installation, plus certbot for the renewal of SSL certificates.

This container acts as a reverse proxy, so it’s configuration is very simple, just a small disk and an eth0 interface. It’s connected to the LXD bridge. Only nginx and certbot (as snap) are installed.

    root@srv~#lxc info c1
    Name: c1
    Status: RUNNING
    Type: container
    Architecture: x86_64
    PID: 23431
    Created: 2023/07/09 15:32 CEST
    Last Used: 2023/08/08 10:21 CEST
    
    Resources:
      Processes: 60
      Disk usage:
        root: 1.98GiB
      CPU usage:
        CPU usage (in seconds): 708
      Memory usage:
        Memory (current): 221.73MiB
        Memory (peak): 550.66MiB
      Network usage:
        eth0:
          Type: broadcast
          State: UP
          Host interface: vethcfa30550
          MAC address: 00:16:3e:8b:fc:70
          MTU: 1500
          Bytes received: 3.50GB
          Bytes sent: 438.54MB
          Packets received: 1899674
          Packets sent: 1847827
          IP addresses:
            inet:  10.142.162.214/24 (global)
            inet6: [...] (global)
            inet6: [...] (link)
        lo:
          Type: loopback
          State: UP
          MTU: 65536
          Bytes received: 3.72GB
          Bytes sent: 3.72GB
          Packets received: 1582218
          Packets sent: 1582218
          IP addresses:
            inet:  127.0.0.1/8 (local)
            inet6: ::1/128 (local)

In quite regular intervals, this container seems to lose its network connection. Simply running dhclient -r followed by dhclient is sufficient to get back online again. So, while the fix is easy, I’d prefer if it didn’t happen at all.

I guess I could just set a static IP, and most likely that’s what I’m going to do. Still, I assume there must be a way to solve it differently.

When looking at the log, this is what happens right before the container loses its connection.

Aug 15 17:24:28 c1 systemd[1]: Starting Daily apt download activities...
Aug 15 17:24:30 c1 systemd[1]: Starting Update APT News...
Aug 15 17:24:30 c1 systemd[1]: Starting Update the local ESM caches...
Aug 15 17:24:32 c1 systemd[1]: apt-news.service: Deactivated successfully.
Aug 15 17:24:32 c1 systemd[1]: Finished Update APT News.
Aug 15 17:24:33 c1 systemd[1]: esm-cache.service: Deactivated successfully.
Aug 15 17:24:33 c1 systemd[1]: Finished Update the local ESM caches.
Aug 15 17:24:40 c1 systemd[1]: apt-daily.service: Deactivated successfully.
Aug 15 17:24:40 c1 systemd[1]: Finished Daily apt download activities.
Aug 15 22:19:48 c1 systemd[1]: snapd.service: Watchdog timeout (limit 5min)!
Aug 15 22:20:50 c1 systemd[1]: snapd.service: Killing process 180 (snapd) with signal SIGABRT.
Aug 15 22:32:28 c1 snapd[180]: SIGABRT: abort
Aug 15 22:32:28 c1 snapd[180]: PC=0x563086783fc1 m=0 sigcode=0
Aug 15 22:32:28 c1 snapd[180]: goroutine 0 [idle]:
Aug 15 22:32:28 c1 snapd[180]: runtime.futex(0x5630882fc0c8, 0x80, 0x0, 0x0, 0xc000000000, 0x7ffd00000000, 0x563086757895, 0xc000064f48, 0x7ffde9c53680, 0x56308672d3b3, ...)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/sys_linux_amd64.s:535 +0x21
Aug 15 22:32:28 c1 snapd[180]: runtime.futexsleep(0x5630882fc0c8, 0x7ffd00000000, 0xffffffffffffffff)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/os_linux.go:44 +0x46
Aug 15 22:32:28 c1 snapd[180]: runtime.notesleep(0x5630882fc0c8)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/lock_futex.go:151 +0xa3
Aug 15 22:32:28 c1 snapd[180]: runtime.stoplockedm()
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/proc.go:2068 +0x8c
Aug 15 22:32:28 c1 snapd[180]: runtime.schedule()
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/proc.go:2469 +0x489
Aug 15 22:32:28 c1 snapd[180]: runtime.park_m(0xc0002e2480)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/proc.go:2610 +0xa3
Aug 15 22:32:28 c1 snapd[180]: runtime.mcall(0x1)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/asm_amd64.s:318 +0x53
Aug 15 22:32:28 c1 snapd[180]: goroutine 1 [select, 10930 minutes]:
Aug 15 22:32:28 c1 snapd[180]: main.run(0xc000492540, 0x0, 0x0)
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/cmd/snapd/main.go:153 +0x4e5
Aug 15 22:32:28 c1 snapd[180]: main.main()
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/cmd/snapd/main.go:64 +0x10e
Aug 15 22:32:28 c1 snapd[180]: goroutine 19 [syscall, 10930 minutes]:
Aug 15 22:32:28 c1 snapd[180]: os/signal.signal_recv(0x0)
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/sigqueue.go:147 +0x9e
Aug 15 22:32:28 c1 snapd[180]: os/signal.loop()
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/os/signal/signal_unix.go:23 +0x24
Aug 15 22:32:28 c1 snapd[180]: created by os/signal.init.0
Aug 15 22:32:28 c1 snapd[180]:         /usr/lib/go-1.13/src/os/signal/signal_unix.go:29 +0x43
Aug 15 22:32:28 c1 snapd[180]: goroutine 75 [select, 10930 minutes]:
Aug 15 22:32:28 c1 snapd[180]: github.com/snapcore/snapd/overlord/ifacestate/udevmonitor.(*Monitor).Run.func1(0x1, 0xc0001a07a8)
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/overlord/ifacestate/udevmonitor/udevmon.go:147 +0x329
Aug 15 22:32:28 c1 snapd[180]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc000455200, 0xc0007ea600)
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Aug 15 22:32:28 c1 snapd[180]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Aug 15 22:32:28 c1 snapd[180]: goroutine 46 [select]:
Aug 15 22:32:28 c1 snapd[180]: github.com/snapcore/snapd/overlord/standby.(*StandbyOpinions).Start.func1(0xc0000cdd10)
Aug 15 22:32:28 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/overlord/standby/standby.go:105 +0x12b
Aug 15 22:32:28 c1 snapd[180]: created by github.com/snapcore/snapd/overlord/standby.(*StandbyOpinions).Start
Aug 15 22:32:22 c1 systemd[1]: snapd.service: State 'stop-watchdog' timed out. Killing.
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/overlord/standby/standby.go:96 +0xaa
Aug 15 22:32:39 c1 snapd[180]: goroutine 47 [select, 21 minutes]:
Aug 15 22:32:39 c1 snapd[180]: github.com/snapcore/snapd/overlord.(*Overlord).Loop.func1(0x0, 0x0)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/overlord/overlord.go:477 +0x17b
Aug 15 22:32:39 c1 snapd[180]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc0000cde00, 0xc0004e9ea0)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Aug 15 22:32:39 c1 snapd[180]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Aug 15 22:32:39 c1 snapd[180]: goroutine 48 [IO wait, 623 minutes]:
Aug 15 22:32:39 c1 snapd[180]: internal/poll.runtime_pollWait(0x7f5d38487d68, 0x72, 0x0)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/netpoll.go:184 +0x57
Aug 15 22:32:39 c1 snapd[180]: internal/poll.(*pollDesc).wait(0xc00010e398, 0x72, 0x0, 0x0, 0x563087203b91)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/internal/poll/fd_poll_runtime.go:87 +0x47
Aug 15 22:32:39 c1 snapd[180]: internal/poll.(*pollDesc).waitRead(...)
Aug 15 22:32:39 c1 snapd[180]: internal/poll.(*pollDesc).waitRead(...)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/internal/poll/fd_poll_runtime.go:92
Aug 15 22:32:39 c1 snapd[180]: internal/poll.(*FD).Accept(0xc00010e380, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/internal/poll/fd_unix.go:384 +0x1fa
Aug 15 22:32:39 c1 snapd[180]: net.(*netFD).accept(0xc00010e380, 0x203000, 0xc0004f3cb0, 0x563086751123)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/net/fd_unix.go:238 +0x44
Aug 15 22:32:39 c1 snapd[180]: net.(*UnixListener).accept(0xc000170120, 0x563087b49468, 0xc0004f3ce8, 0xc0004f3cc0)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/net/unixsock_posix.go:162 +0x34
Aug 15 22:32:39 c1 snapd[180]: net.(*UnixListener).Accept(0xc000170120, 0xc000168118, 0x8, 0xc0004f3cc0, 0x5630871ea235)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/net/unixsock.go:260 +0x49
Aug 15 22:32:39 c1 snapd[180]: github.com/snapcore/snapd/daemon.(*ucrednetListener).Accept(0xc0001702a0, 0xc0004f3da8, 0x18, 0xc0001bd500, 0x563086a1a586)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/daemon/ucrednet.go:112 +0x4a
Aug 15 22:32:39 c1 snapd[180]: net/http.(*Server).Serve(0xc00012e1c0, 0x563087b862e0, 0xc0001702a0, 0x0, 0x0)
Aug 15 22:32:39 c1 snapd[180]:         /usr/lib/go-1.13/src/net/http/server.go:2896 +0x282
Aug 15 22:32:39 c1 snapd[180]: github.com/snapcore/snapd/daemon.(*Daemon).Start.func1(0x0, 0x0)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/daemon/daemon.go:378 +0x56
Aug 15 22:32:39 c1 snapd[180]: gopkg.in/tomb%2ev2.(*Tomb).run(0xc0004d4050, 0xc00019d440)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:163 +0x2d
Aug 15 22:32:39 c1 snapd[180]: created by gopkg.in/tomb%2ev2.(*Tomb).Go
Aug 15 22:32:23 c1 systemd[1]: snapd.service: Killing process 180 (snapd) with signal SIGKILL.
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/vendor/gopkg.in/tomb.v2/tomb.go:159 +0xc9
Aug 15 22:32:39 c1 snapd[180]: goroutine 49 [select]:
Aug 15 22:32:39 c1 snapd[180]: main.runWatchdog.func1(0xc0000cde50, 0xc0004d4000)
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/cmd/snapd/main.go:93 +0xe5
Aug 15 22:32:39 c1 snapd[180]: created by main.runWatchdog
Aug 15 22:32:39 c1 snapd[180]:         /build/snapd/parts/snapd-deb/build/cmd/snapd/main.go:91 +0x1e2
Aug 15 22:32:39 c1 snapd[180]: goroutine 98 [IO wait, 10930 minutes]:
Aug 15 22:32:39 c1 snapd[180]: internal/poll.runtime_pollWait(0x7f5d38487e38, 0x72, 0x0)
Aug 15 22:32:23 c1 systemd-networkd[157]: eth0: Could not set DHCPv4 address: Connection timed out
Aug 15 22:32:40 c1 snapd[180]:         /usr/lib/go-1.13/src/runtime/netpoll.go:
Aug 15 22:32:23 c1 systemd-networkd[157]: eth0: Failed
Aug 15 22:32:32 c1 systemd[1]: snapd.service: Main process exited, code=killed, status=9/KILL
Aug 15 22:32:32 c1 systemd[1]: snapd.service: Failed with result 'watchdog'.
Aug 15 22:32:47 c1 systemd[1]: snapd.service: Scheduled restart job, restart counter is at 1.
Aug 15 22:32:48 c1 systemd[1]: Stopped Snap Daemon.
Aug 15 22:32:53 c1 systemd[1]: Starting Snap Daemon...
Aug 15 22:33:23 c1 snapd[5434]: overlord.go:272: Acquiring state lock file
Aug 15 22:33:23 c1 snapd[5434]: overlord.go:277: Acquired state lock file
Aug 15 22:33:24 c1 snapd[5434]: daemon.go:247: started snapd/2.59.5 (series 16; classic) ubuntu/22.04 (amd64) linux/4.15.0-213-generic.
Aug 15 22:33:24 c1 snapd[5434]: daemon.go:340: adjusting startup timeout by 1m0s (pessimistic estimate of 30s plus 5s per snap)
Aug 15 22:33:25 c1 systemd[1]: tmp-syscheck\x2dmountpoint\x2d848018183.mount: Deactivated successfully.
Aug 15 22:33:25 c1 snapd[5434]: backends.go:58: AppArmor status: apparmor is enabled and all features are available
Aug 15 22:33:26 c1 systemd[1]: Started Snap Daemon.
Aug 15 22:33:26 c1 dbus-daemon[173]: [system] Activating via systemd: service name='org.freedesktop.timedate1' unit='dbus-org.freedesktop.timedate1.service' requested by ':1.44' (uid=0 pid=5434 comm="/usr/lib/snapd/sna>
Aug 15 22:33:26 c1 systemd[1]: Starting Time & Date Service...
Aug 15 22:33:27 c1 dbus-daemon[173]: [system] Successfully activated service 'org.freedesktop.timedate1'
Aug 15 22:33:27 c1 systemd[1]: Started Time & Date Service.
Aug 15 22:33:28 c1 snapd[5434]: storehelpers.go:769: cannot refresh: snap has no updates available: "certbot", "core", "core20", "core22", "lxd", "snapd"
Aug 15 22:33:28 c1 systemd[1]: Reloading.
Aug 15 22:33:30 c1 systemd[1]: Starting Service for snap application certbot.renew...
Aug 15 22:33:30 c1 systemd[1]: Starting Check to see whether there is a new version of Ubuntu available...
Aug 15 22:33:30 c1 systemd[1]: update-notifier-motd.service: Deactivated successfully.
Aug 15 22:33:30 c1 systemd[1]: Finished Check to see whether there is a new version of Ubuntu available.
Aug 15 22:33:36 c1 systemd[1]: snap.certbot.renew.service: Deactivated successfully.
Aug 15 22:33:36 c1 systemd[1]: Finished Service for snap application certbot.renew.
Aug 15 22:33:57 c1 systemd[1]: systemd-timedated.service: Deactivated successfully.
Aug 16 00:00:28 c1 systemd[1]: Starting Daily dpkg database backup service...
Aug 16 00:00:28 c1 systemd[1]: dpkg-db-backup.service: Deactivated successfully.
Aug 16 00:00:28 c1 systemd[1]: Finished Daily dpkg database backup service.
Aug 16 03:15:28 c1 systemd[1]: Starting Daily apt download activities...
Aug 16 03:15:59 c1 systemd-networkd-wait-online[5579]: Timeout occurred while waiting for network connectivity.
Aug 16 03:15:59 c1 apt-helper[5577]: E: Sub-process /lib/systemd/systemd-networkd-wait-online returned an error code (1)
Aug 16 03:16:01 c1 systemd[1]: Starting Update APT News...
Aug 16 03:16:01 c1 systemd[1]: Starting Update the local ESM caches...
Aug 16 03:16:03 c1 systemd[1]: apt-news.service: Deactivated successfully.
Aug 16 03:16:03 c1 systemd[1]: Finished Update APT News.

I’m really no expert at reading logs but I do find it interesting that the following line seems not correctly placed as the timestamp is earlier than the lines above it.

Aug 15 22:32:23 c1 systemd-networkd[157]: eth0: Could not set DHCPv4 address: Connection timed out

So, if I read this correctly, the system sends a SIGABRT message to the snapd service. This doesn’t seem to work for some reason and fails 12 minutes later, generating a stack trace before then sending a KILL signal to snapd.

Within these 12 minutes, the system tries to renew its ip address but fails, likely because the SIGABRT has brought the system to a complete hang.

I’m hopeful for recommendations what to do about this.