Developer sprint Sep 17th, 2018

niemeyer · September 17, 2018, 11:52am

Health checks

Hook named “check-health” (or similar) is called by snapd to check for health
Hook needs to be idempotent so snapd can call it at any time
Hook might be called every 5 minutes to begin with, maybe? We can always increase later (but we can’t easily reduce it, without risking breaking applications that expect having more time for it)
Have special status “unknown” if the snap hasn’t bothered to define it
If the health-check hook is called and it doesn’t update the status, it’s set to “unknown”
Hook can call snapctl set-status --code=[<error code>] <status> [<message>] (or set-health?)
Message is a free form sentence, hopefully capitalized and readable
Message is forbidden for active status
Reserve “snapd-*” error code namespace for snapd
Error code is a custom string [a-z]+(-?[a-z0-9]){3,} (dashes in the middle only)
Reuse statuses from juju (maybe not all of them)
- active / ~~maintenance~~ / waiting / blocked / error
Still need to consider whether to use “active”, due to conflict between “active daemon” and “health=active”
Client reports to server the status in two situations:
- When the health checks are run during a change, the pre/post status is immediately reported afterwards
- When the health checks are run “at rest”, the pre/post status is aggregated and reported on the next exchange
Should also report a green→green situation so the server can differentiate between “bricked” and “all good”
When do we actually revert a refresh automatically:
- Status going from active→error across a refresh
- Status going from active→blocked across refresh, requesting manual refresh instead? Discuss this further, probably not for first release.
Reverts should never be triggered at rest, because we don’t know what caused the error
Health hook name as check-health might be more indicative of desired behavior than update-status

Avoid issues in juju status hook:

Frequent updates cause writes too frequently
Not require message for the active status