Snap service killed by watchdog-timeout

dhoomakethu · November 15, 2018, 4:07am

We recently encountered this problem on one of our machines and decided to enable watchdog-timeout on the snap service just so that the service would be killed instead of triggering the system-level watchdog timer and there by avoiding device reboot. The problem now is the service stays alive only for the duration of watchdog-timeout and gets killed at the end of timeout which was unexpected.

The question is do we have to implement something in our app to call sd_notify regularly as mentioned here or is this something the daemon-notify interface handles when we connect to it ?

Journalctl logs

Nov 15 09:17:18 systemd[1]: Started Service for snap application my-snap.myser
Nov 15 09:18:19 systemd[1]: snap.my-snap.myser.service: Watchdog timeout (limit 1min)!
Nov 15 09:18:19 systemd[1]: snap.my-snap.myser.service: Main process exited, code=killed, status=6/ABRT
Nov 15 09:18:19 systemd[1]: snap.my-snap.myser.service: Unit entered failed state.
Nov 15 09:18:19 systemd[1]: snap.my-snap.myser.service: Failed with result 'signal'.

Here is the relevant portion of the snapcraft.yaml

restart-condition: never
    passthrough:
      watchdog-timeout: 10m
      restart-delay: 5s

Other info

$ systemd --version
systemd 229
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN

$ snap version
snap    2.35.5
snapd   2.35.5
series  16
kernel  4.4.0-122-generic

mvo · November 15, 2018, 7:35am

Thanks for your message - yes, your app needs to ping systemd periodically via sd_notify(“WATCHDOG=1”) to tell systemd that it is still alive.