Feeding cloud-init data to snapd

pedronis · January 10, 2018, 4:28pm

We want to have a mechanism in cloud images to pass information like cloud name, region, availability zone to snapd. Our base case has cloud-init. Ideally the information needs to be available/been passed before snapd attempts the first refresh.

The motivation of having cloud placement information in snapd is being able to choose an in-cloud proxy/cache together with the general store.

As I learned most of cloud-init runs in a cloud image based on systemd before snapd starts.
This is because snapd.service doesn’t specify DefaultDependencies=no so gets an implicit After=syssinit.target. While cloud-init.service itself is configured as Type=oneshot, Before=sysinit.target, DefaultDependencies=no. Because cloud-init collects the kind of information we need about cloud placement very early in its cloud-init.service part run, by the time snapd runs or is invoked by cloud-init itself the information should be available. Moreover cloud-init writes this information into /run/cloud-init/instance-data.json under the "v1" mapping.

I think we have more or less this spectrum of options for snapd to get this information, some more specific to cloud-init and some that would be more general:

snapd could simply check for /run/cloud-init/instance-data.json when it starts and get the information from there. This hard-codes cloud-init details into snapd.
We could instead define a snapd specific file under /run or /var/snapd/run ([snapd-]cloud.yaml?) with the information structured along snapd wishes.
We could define a more general mechanism of files/a file to convey runtime information to snapd before it starts (this ought to be limited to information whose effects should be ok written by root but without a brand signature on them though), then cloud placement information would be using this. Would such information be a subset of core config/reflected as core config while snapd runs to anchor the concept/not multiply concepts here?
A cloud-init plugin could write a /etc/systemd/system/snapd.service.d/cloud.conf setting envar(s) to pass the information.
A cloud-init plugin could use snap set core to set the information under a (strawman) proxy.cloud-placement namespace. We have a race here with the first refresh that could potentially happen before the config is committed. snapd would need to know by other means that this information is forthcoming. Strawman could be a default config value from the gadget like proxy.cloud-placement=wait. This requires the presence of a gadget. Or some other bit of information set by ubuntu-image when building a cloud image.
A gadget hook (called when? In which situations?) that would instead call snapctl set on its config to transmit the information. Again snapd might need to know to wait (until timeout) before the first refresh for this forthcoming information. Again a gadget hook implies the presence of a gadget.

Question: are there some image migration scenarios where is important for snapd to pick up a change in this information? Is this supported? Would be next boot be enough? This issue also could influence the preferred mechanism.

niemeyer · January 10, 2018, 4:48pm

Thanks for putting those notes together.

It would be nice to not special case the functionality to cloud-init, but instead use or implement something general that could be leveraged for similar problems in the future. Some of those suggestions already go into this direction, so let’s just see which would work best for these needs.

On option 5, what’s the behavior of prepare-device today in terms of refreshes?

pedronis · January 10, 2018, 6:46pm

So prepare-device is part of the device registration flow. It is executed conceptually once per system up to retries of device registration itself. That’s probably important in relation to the open question I have at the end: it is not run at each boot or restart of snapd etc.

Current rules are that we delay the refreshes until the device is registered (and that respecting lazy registration on classic) or we failed for a while to register.

So today prepare-device has been run (at least once) when we try to refresh the first time.

But we have also discussed that if gadget/preinstalled core or bases become more common, we might change the rules about lazy registration on classic, in which case we would have to decide what to do about first refresh, do we try to refresh just core and gadget before registration or not? in that scenario if we keep prepare-device tied into registration as it is, it could be that prepare-device runs sometimes past first refresh.

niemeyer · January 10, 2018, 7:48pm

It feels a bit error prone to refresh before we even attempt to setup the device the first time, instead of establishing a common path which is always the one tested. We cannot avoid supporting the other path, in which the device first sets up and then refreshes, because that can always happen if the refresh doesn’t work for whatever reason (e.g. not fast enough to take precedence, or device is offline).

Not sure lazy classic registration plays a role here. If the device has a gadget, it has snaps in use and should attempt to refresh, so it’s not lazy anymore, right? Or has that changed?

In either case, these questions are a hint that perhaps this is getting too complex for such a simple problem. Your suggestion 2 is also nice, and simpler. We might support a yaml configuration file in a well known place that is loaded when snapd is starting, assuming it’s owned by root and with permission bits restrictive enough.

We might either do it at seed time only, or if we want to make it more generic we can request the file to be written in a location controlled by snapd, and we remove the file after the options are loaded, ensuring the authoritative place to remain inside snapd proper, which avoids user options from being overwritten if they are changed via the usual means.

pedronis · January 11, 2018, 8:55am

no, that has not changed, if there’s a gadget is not lazy even on classic. But as I mentioned, I remember we had discussions to possibly change that and keep laziness on classic until the user installs other snaps if the gadget/core came preinstalled.

But yes I agree that 2. might be a simpler option, there are too many behaviours tied up and things we might change with the hook we have and the use case is too narrow to add a new hook I think.

pedronis · January 11, 2018, 4:07pm

Had a good chat with @niemeyer, the plan is to have a general mechanism for the root user on the system to have a voice about configuration defaults for snaps before seeding (first time snapd starts at all) or before the snap is installed, this is similar to how the defaults stanza in the gadget works for the brand.

The plan is to support having files under /var/lib/snapd/seed/config/*.yaml, they would need to be non-world writable, each file would have the format:

defaults:
    <snap id>:
        <key>: <value>

the default configuration used at seeding or install for a snap would be equivalent to appling the stanzas for the snap (matching by snap id), first from gadget.yaml and then from the files under /var/lib/snapd/seed/config/ processed in lexicographical order by file name.

This is quite consistent with the preexisting mechanism. Now concretely for the original specific use case cloud-init would make sure to write – before snapd starts – a file /var/lib/snapd/seed/config/cloud-init.yaml with content like:

defaults:
  99T7MUlRhtI3U0QFgl5mXXESAiSwt776:  # snap-id of core
    proxy.cloud-placement:
       cloud-name: CLOUD-NAME
       region: REGION
       availability-zone: AZ

niemeyer · January 11, 2018, 4:15pm

@pedronis Thank for the summary. Looking good!

The only thing worth talking about further is the position for these specific configuration options. “proxy” doesn’t feel like a good root for information such as cloud-name, region, etc.

pedronis · January 11, 2018, 4:16pm

Yes, that’s why I wrote “like”, current key is a strawman.

smoser · January 11, 2018, 5:23pm

@niemeyer makes the statement “It would be nice to not special case the functionality to cloud-init” above. In the same way that snappy does not want to have specific knowledge of cloud-init, cloud-init would prefer to not have specific knowledge of snappy.

Specifically, from cloud-init’s perspective, it writes a json formatted file in /run/cloud-init/ that any consumer can read. I did suggest that we’re not entirely opposed to specifically handling this bit of information for snappy, but that gives cloud-init maintenance burden in the future should snappy ever change its interfaces. Cloud-init would also probably provide some way for the user to disable this behavior, or provide their own static information…

I’m not entirely opposed to all this, but it honestly seems simple enough for snappy to consume a json formatted file in /run if it is present. Cloud-init promises to keep the format of the file. That puts snappy entirely in control of its own future, rather than potentially requiring changes in another piece of software at some point in the future.

pedronis · January 11, 2018, 5:38pm

As I wrote elsewhere, afaict, the writing could be done after a quick check from snap_config in cloud-init which is already snapd specific afaiu (it would need to run on classic too though).

If we introduce in snapd both the general mechanism (to be clear the general mechanism is equivalent to a bunch of snap set calls and is not tied to just cloud-init or just this kind of information) and the cloud specific keys we would also need to promise to keep them working.

I let @niemeyer chime in more.

smoser · January 11, 2018, 6:11pm

would/do ‘snap set’ calls work before snapd is started?

pedronis · January 11, 2018, 6:32pm

no, in the sense that they would start snapd and then there’s a potential race with the first refresh (that’s touched in my point 4. in the original discussion )

niemeyer · January 11, 2018, 7:16pm

@pedronis It might be easier to do the whole thing on our end instead of coordinating, as @smoser points out, and put the plan of the pre-seeding configuration on the shelf until we need it for something else (it’s still a good idea).

For cloud-init, we could just read its status (or result?) file a single time early during the seeding routine, and translate it into the relevant configuration options for core.

@smoser Where can we find complete documentation about these files? I couldn’t find references to which provider and region the machine is in, etc.

pedronis · January 11, 2018, 7:33pm

I just noticed that snap_config is not part of cloud-init init, so it happens too late unless it would be split or moved earlier.

smoser · January 11, 2018, 8:43pm

@pedronis, you’re right it woudl be too late. snap_config is probably better suited to be run later in boot, as it will do snap install and things. better to have that happen as more of the system is up. At least that is what we’ve ended up doing for apt.

note that we’re epxecting to improve/replace the snap config sometime soon here.

smoser · January 11, 2018, 8:44pm

@niemeyer the data thre is new and will get documented more as we go.
That function will make its way back to 16.04 in a cloud-init SRU in the next month or so.
I’ll update this post with link to the MP for documentation as it comes.

niemeyer · January 11, 2018, 8:49pm

We can’t wait for the system to be up to decide whether we are in a cloud or not, because snapd will start to do actions that need to take into account where it is (local mirrors, etc).

About the details, without knowing exactly how it looks and what data is available, it’s a bit hard to take the conversation forward. We don’t even know if it will satisfy the needs at stake or if we’ll need to come up with something else.

pedronis · January 11, 2018, 9:06pm

From what I understood looking at cloud-init the data details varies by cloud (the basic terminology is shared though in the Datasource interface in cloud-init) but is the same data (region, availability-zone) that is used to pick in-cloud apt mirrors, so I think it should satisfy the needs. What is recent is the json file on disk capturing this info.

blackboxsw · January 11, 2018, 9:47pm

Here are a couple of examples of what will show up on various clouds
EC2: https://paste.ubuntu.com/26368359/
GCE: https://paste.ubuntu.com/26165085/
Azure: https://pastebin.ubuntu.com/26075842/
OpenStack: https://pastebin.ubuntu.com/26084548/
lxc: http://paste.ubuntu.com/26368388/

pedronis · January 12, 2018, 2:12pm

Chatted again with @niemeyer,

in the end snapd will read at seeding (first time it starts) /run/cloud-init/instance-data.json if present and will reflect the relevant data from v1 under a cloud namespace in the core configuration.

A variant of option 1.