Waiting for daemon in configure hook and debugging

Hey,
I recently published the FidusWriter snap to the store. It works fine for me locally, but installing it on a VPN, I found two issues:

A) The configure hook needs the included mysql server in place for it to work. I used the same trick that the nextcloud snap uses: Let the hook wait for a file to be present that is being created by the mysql part:

from time import sleep

...
timer = 0
# We wait for the password file to be created
while timer < 10 and not os.path.isfile(PASSWORD_PATH):
    timer += 1
    sleep(1)
...

This should only add a wait time of maximum 10 seconds, but somehow it seems to take much longer than that. Is there any way I can debug the installation process and get more verbose output? Also, it seems to not work the first time I attempt to install the snap - and I cannot tell why. The installation doesn’t succeed, and this part works the second time. Is there a way to reset it to behave the same it did the first time?

B) The second installation attempt seems to be working, but configure hook is killed after 5 minutes when it is still not done and so the installation is halted. Is there a way for me to increase that limit? And if not, is there a way to move some things I do during the configure hook that are needed to have the system up and running to another hook?

What seems to work is to move the build of certain cache files from the config hook to the start of the daemon. But I then somehow need to communicate to the user that they need to wait 5-10 minutes after installing the package until it’s really up and running. Is there any way to do that?

Have you tried putting the code that waits for a file to a daemon, and order that daemon to run before the main daemon? then you can make the file-waiting daemon a oneshot daemon, and I think that the snap install will block waiting for that first daemon to finish before starting the second one (and then after the second daemon is started the snap install finishes). Typically hooks should be written to be as quick as possible, letting long running tasks be run elsewhere.

Hey @ijohnson do you know of any examples of creating such a first simple daemon that does something and then causes a second daemon to start and then exists? I couldn’t find anything on that here https://snapcraft.io/docs/services-and-daemons . The main simple daemon that I’m running right now seems to not influence when the installation process is finished.

Well it’s a quite complicated example, but there’s the edgexfoundry snap: https://github.com/edgexfoundry/edgex-go/blob/master/snap/snapcraft.yaml#L68-L340

To be clear, all I think you would need to do is something like this:

apps:
  wait-for-file:
    command: wait-for-file.sh
    daemon: oneshot
  do-thing-with-file:
    command: do-thing-with-file.sh
    daemon: simple
    after: 
      - wait-for-file

(also I originally had a typo and said that the wait-for-file daemon should be a simple daemon, this was wrong I meant oneshot daemon)

@ijohnson Thanks. This seems to work the first time the snap is installed. But when it is “refreshed” the oneshot daemon is not run again. My configure script restarts the regular daemon which then takes care of building files for a few minutes, so for the user the service just seems to disappear for a few minutes without any explanation.

@ijohnson Hey again, I have refreshed my own snap and some times it doesn’t update correctly. I am trying to understand what goes wrong, but for that I’d first need to understand how it is supposed to work. Are oneshot daemons set up as you described above meant to also run when you do a refresh of the snap? I see it was mentioned earlier in this forum that they can fail if they don’t complete within a “reasonable amount of time”, but what does that concretely mean? Should I put things that last 30 seconds in there? 5 minutes? 20 minutes?

Right now I am trying to understand if I should use a post-refresh hook to recompile JS files, or whether I should have a oneshot daemon do that both for install and refresh.

In the end it turned out that the oneshot daemon had a timeout of 1m1s. That was not enough, so I have had to remove it again and just make the main daemon be responsible for first compiling and then serving the web page. I’ll have to find another way around the problem of the web server not being available at times. Right now I’m thinking I may bundle nginx with it and set it up to serve a backup page saying “We are setting up, please wait” while the main server is down.