Stop-command is not executed if some interfaces are connected

c.schulz · February 6, 2024, 2:51pm

I am currently facing the following issue:

I have a snap with a daemon that spawns a few processes. I want to add a “stop-command” that executes another command before all these processes are terminated whenever the daemon is stopped. Now I found out, that the “stop-command” is not executed as long as I have the plugs “raw-usb” or “kubernetes-support” connected. I can find a workaround for “kubernetes-support”, but not for “raw-usb”.

I reproduced it with a minimum example: A snap that runs a daemon which spawns and kills a few dummy processes and logs that into a file in $SNAP_COMMON. Minimum snapcraft.yaml:

name: test
base: core22
summary: Start/Stop Test
description: |
  This snap tests starting/stopping services

confinement: strict
adopt-info: part-with-metadata
grade: devel
version: 0.0.1

parts:
  scripts:
    plugin: dump
    source: ./dump/
    source-type: local

apps:
  daemon:
    daemon: forking
    command: test start
    stop-command: test stop
    reload-command: test restart
    plugs:
      - kubernetes-support
      - raw-usb
      - system-observe

The folder dump contains a script test that provides the start and stop commands:

#!/bin/bash

LOGFILE=$SNAP_COMMON/Test.log

echo "===========================================" >> $LOGFILE
echo "Execute ${0} ${@}" >> $LOGFILE


start_test() {
  echo "Start Test Service" >> $LOGFILE
  rm $SNAP_COMMON/*.pid
  for i in {1..10}; do
    sleep infinity &
    PID=$(jobs -p | tail -n 1)
    echo "PID $i: $PID" >> $LOGFILE
    echo "$PID" > $SNAP_COMMON/Test_$i.pid
  done
  echo "Start Finished" >> $LOGFILE
}

stop_test(){
  echo "Stop Test Service" >> $LOGFILE
  for i in {1..10}; do
    PID=$(cat $SNAP_COMMON/Test_$i.pid)
    echo "PID $i: $PID" >> $LOGFILE
    kill -s SIGTERM $PID
  done
  echo "Stop Finished" >> $LOGFILE
}

# Aktionen
case "$1" in
  start)
    start_test
    ;;
  stop)
    stop_test
    ;;
  restart)
    stop_test
    start_test
    ;;
esac

echo "===========================================" >> $LOGFILE
exit 0

I tested that by switching between “sudo snap start test.daemon” and “sudo snap stop test.daemon”. Whenever either “raw-usb” or “kubernetes-support” are connected, I do not get any output in the log file.

I think it is related to the udev rules that are created in /etc/udev/rules.d/70-snap.test.rules. If I uncomment a few lines as follows, the “stop-command” is executed:

# This file is automatically generated.
# kubernetes-support
#KERNEL=="kmsg", TAG+="snap_test_daemon"
# raw-usb
SUBSYSTEM=="tty", ENV{ID_BUS}=="usb", TAG+="snap_test_daemon"
# raw-usb
#SUBSYSTEM=="usb", TAG+="snap_test_daemon"
# raw-usb
#SUBSYSTEM=="usbmisc", TAG+="snap_test_daemon"
TAG=="snap_test_daemon", SUBSYSTEM!="module", SUBSYSTEM!="subsystem", RUN+="/usr/lib/snapd/snap-device-helper $env{ACTION} snap_test_daemon $devpath $major:$minor"

Further, if I disconnect the “raw-usb” interface, but re-add the udev rules above by hand (not uncommented), the “stop-command” is also not executed.

Is there any possibility I can have a “raw-usb” interface connected and the “stop-command” is executed? Many thanks in advance.

zyga · February 7, 2024, 8:19am

This is quite unexpected.

My initial suspicion was related to Delegate=true that may be generated due to kubernetes-support but that is done on the permanent side of the plug, so it does not change if the interface is connected or disconnected.

The raw-usb interface is even more puzzling, since there’s literally nothing special about it. At most it does some udev tagging.

One thing that struck me about the code you’ve posted is the use of daemon: forking. If your daemon really forks then perhaps systemd misidentifies the process and thinks the service is not really running, thus making stop command ineffective (because the service appears to be stopped already). Do you really use a forking daemon? If you are unsure or the service doesn’t really fork then this could be the source of the problem.

c.schulz · February 7, 2024, 8:54am

Thanks for you reply.

Your suspicion with kubernetes-support seems to be right. I removed that interface from snapcraft.yaml, and now the “stop-command” is executed regardless of the connection state of raw-usb or any udev rules. So to summarize the problem: if kubernetes-support is added as a plug, and either kubernetes-support or raw-usb are connected, then the “stop-command” is not being executed. I don’t really understand why (yet), but that solves my problem. Many thanks!

I also tested with daemon: oneshot and the behavior seems to be the same. In “start”, my daemon runs a script that fires up some background processes and then exits. In “stop”, I need to do something with these background processes before they are terminated, but that also makes them terminate on their own. So I think, either forking or oneshot is correct for that daemon. I’m not entirely sure, however, which of them I should use, because I don’t completely understand the difference. But even if the daemon mode makes systemd misidentifying something, I think that would not have explained why the behavior changes if the interfaces are connected or disconnected.

zyga · February 7, 2024, 9:14am

I’m pretty sure the real problem is still misunderstanding of what daemon: ... does.

In short, systemd needs to track the service process. There’s several ways in which that happens. The most simple way is daemon: simple - the service process runs without any special handling. Any child processes are tracked as well and contribute to the set of processes regarded as a part of the service.

Forking is for old-style code that predates service managers and should almost never be used today. When misused systemd can misidentify the correct process.

Oneshot is for “task” like things, that don’t run continuously but instead run to completion. The key distinction then is if the service is done running (the process exits) is the service still considered “running” or not. This is relevant specifically for stop actions (cannot stop something that’s not running). Here remain-after-exit is a separate toggle that you can use to control if the service should be considered running after a oneshot process terminates.

IMO unless you know any better use deamon: simple. Read systemd man pages for details on how those things interact.

c.schulz · February 7, 2024, 9:29am

Thanks for your explanation. Did I understand correctly that I can specify remain-after-exit manually for my snap daemon? I didn’t find anything in the yaml reference or in the page about daemons.

zyga · February 7, 2024, 11:10am

I was mistaken to assume it is exposed directly:

	var remain string
	if appInfo.Daemon == "oneshot" {
                // (unrelated code removed)
		// If StopExec is present for a oneshot service than we also need
		// RemainAfterExit=yes
		if appInfo.StopCommand != "" {
			remain = "yes"
		}
	}

So in short, if you have stop commands then we inject remain after exit automatically, so that those can execute.

c.schulz · February 7, 2024, 11:18am

Okay, so what be the correct settings for my daemon? daemon: oneshot with stop-command defined?

zyga · February 7, 2024, 11:22am

I don’t know, it is your daemon. Can you tell me this:

Does it run until explicitly stopped?
Does it fork?
Does it have a startup script?

c.schulz · February 7, 2024, 11:36am

Yes, it has a startup script similar to the one I added to the example in my original post. I think that example is precise enough to represent my daemon.

The daemon creates other processes and then exits (similar to this, in my example it is sleep infinity &), so in my understanding, yes it is forking.

And these processes need to run until the daemon is stopped, so also yes to the first question.

ogra · February 7, 2024, 11:54am

By explicitly calling the fork() C syscall ? Else you want daemon: simple

zyga · February 7, 2024, 11:57am

I think the problem with forking daemon is that you need to keep a long-running process alive and the pid of that process must be what systemd tracks and binds to the lifetime of the service unit. IMO the design you use is somewhat fragile. It is a lot better to drop that and use individual services that are all tracked as separate entities with daemon: simple.

From https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html

If set to forking, the manager will consider the unit started immediately after the binary that forked off by the manager exits. *The use of this type is discouraged, use notify, notify-reload, or dbus instead.*It is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main service process, and the service manager will consider the unit started when the parent process exits. This is the behavior of traditional UNIX services. If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can reliably identify the main process of the service. The manager will proceed with starting follow-up units after the parent process exits.

EDIT, this is also relevant, note that snapd does not expose this so you run with ExitType=main which IMO causes the problems.

ExitType=

Specifies when the manager should consider the service to be finished. One of main or cgroup:

If set to main (the default), the service manager will consider the unit stopped when the main process, which is determined according to the Type=, exits. Consequently, it cannot be used with Type= oneshot.

If set to cgroup, the service will be considered running as long as at least one process in the cgroup has not exited.

It is generally recommended to use ExitType= main when a service has a known forking model and a main process can reliably be determined. ExitType= cgroup is meant for applications whose forking model is not known ahead of time and which might not have a specific main process. It is well suited for transient or automatically generated services, such as graphical applications inside of a desktop environment.

Added in version 250.

Separately, writing this sort of code in shell is very fragile as one must understand the consequences of what shell does and how it handles process management.

My advice:

remove the startup and stop scripts
use separate apps, each with daemon: simple
do not use shell job control if you stil have scripts left

baldeuniversel · February 7, 2024, 3:11pm

Hi @c.schulz . I made some changes to your script. This update works very well.


#!/bin/bash


#
set -uo pipefail


#LOGFILE="./Test.log"
LOGFILE="$SNAP_COMMON/Test.log"

# Variable allowing to know the states `stop, start`  
#flagDaemonRun="flagDaemonFile.log"
flagDaemonRun="$SNAP_COMMON/flagDaemonFile.log"



# Check to see if the file allowing to know the states (start, stop) exists , then ...
if [[ ! ( -e "$flagDaemonRun" ) ]]
then
    # The state is is at stop
    echo "0" > $flagDaemonRun
fi


echo -e "\n\n=======================================================================" >> $LOGFILE
echo -e "Execute ${0} ${@}" >> $LOGFILE



start_test() 
{
    # Local variable
    local flagStart=0



    echo -e "\n~" >> $LOGFILE
    echo -e "Start Test Service ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE


    # A test to see if the above `$SNAP_COMMON/flagDaemonFile.log` created file exists , then ...
    if [[ -e "$flagDaemonRun" ]]
    then
        # This test allows not to execute the below loop a second time, if a `stop action` has not 
        # taken place
        if [[ ` cat "$flagDaemonRun" ` -eq 0  ]]
        then
            # Set the value in the file `$SNAP_COMMON/flagDaemonFile.log` to `1` to indicate that 
            # a `start action` is in progress
            echo "1" > "$flagDaemonRun"
           

            # A sub progress allowing to perform another actions ...
            (
                # Set the variable `$flagStart` to 1 to start the loop below
                flagStart=1
    
    
                # Make the loop run indefinitely until a `stop action` is performed
                while [[ $flagStart -eq 1 ]]
                do
                    # If this test succeeds, it would mean a `stop action` had been carried out, or 
                    # is carried out at this very moment
                    if [[  ` cat "$flagDaemonRun" ` -eq 0  ]]
                    then
                        # Set the variable `$flagStart` to 0
                        flagStart=0
                    fi
    
                    # A timer to delay the loop
                    sleep 0.05
                done 
            ) &


            #
            echo -e "\nStart Finished ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE

        # A test to indicate a `start action` is already in progress
        elif [[ ` cat "$flagDaemonRun" ` -eq 1 ]]
        then
            #
            echo -e "\nThe start action is already in progress ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE
        fi
    else
        #
        echo -e "\n\e[1;031mError\e[0m , missing file ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE

        exit 1
    fi

}

stop_test()
{
    # A test to see if the above `$SNAP_COMMON/flagDaemonFile.log` created file exists , then ...
    if [[ -e "$flagDaemonRun" ]]
    then
        # Check to see if a `start action` is in progress , then ... 
        if [[ ` cat "$flagDaemonRun" ` -eq 1 ]]
        then
            # Set the value in the file `$SNAP_COMMON/flagDaemonFile.log`  to `0` to stop the loop 
            # at the level of the `start action`
            echo "0" > "$flagDaemonRun"
            
            #
            echo -e "\nStop Finished ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE

        # A test to indicate a `start action` is not in progress
        elif [[  ` cat "$flagDaemonRun" ` -eq 0  ]]
        then
            #
            echo -e "\nThe start action is not in progress ~ date : ` date +'%Y-%m-%d %H:%M:%S' `" >> $LOGFILE
        fi
    else
        #
        echo -e "\n\e[1;031mError\e[0m , missing file" >> $LOGFILE

        exit 1
    fi
}

# Aktionen
case "$1" in
  start)
    start_test
    ;;
  stop)
    stop_test
    ;;
  restart)
    stop_test
    start_test
    ;;
esac

echo -e "=======================================================================" >> $LOGFILE
exit 0