Tracking of plugin assets

Hello!

A few months ago we started the work to track assets, that then will be recorded in a manifest file that will be saved inside the snap for auditing.

We have some common assets that we are already tracking, like build packages and stage packages. Now we need to start tracking things that are specific to each plugin. As the first example to get this bootstrapped, think about the python packages installed inside the snap during the build step.

We have discussed about this, and it’s not complex to achieve. But we have two options for how the plugin reports the assets it wants to track:

  • the assets will be returned by the plugin step methods
  • the plugin handler will call a specific plugin method that returns the assets

I’m slightly inclined to the first option. This way it seems to me that the steps would be returning a summary of the important things that they did in the form of assets. In the python example, the method for the build step performs the installation of python packages, so it returns a dictionary like: {'python-packages': [package1=version1, package2=version2]}.

The code would look like this:
https://github.com/snapcore/snapcraft/pull/1395/files

After our discussion this morning, @kyrofa seems a little more inclined to the other option. He can expand more in the comments, but the idea would be that after the plugin handler finishes the build step, it will call a method like get_assets that each plugin needs to implement.

Both seem easily doable, with just some details to solve along the way. For example, if the assets are return values of the steps, then we would need a way to merge the assets coming from pull with the assets coming from build for the final manifest.
If there is a get_assets method, then the plugin will take care of collecting and returning the assets as it wants them recorded in the manifest.
And one additional detail to take into account is that the assets are tracked in the per-part per-step state files, and then copied into the manifest. With the first option, it’s clear that the plugin assets can be stored in these files. With the second option, we need to figure out where to save the assets. Maybe we just save them in the build state, or maybe we add a new state file. We could even remove the state files as intermediaries and keep the assets just in memory.

As I said at the start, this is not a big deal. We just need to agree on the form, and start implementing it for all our plugins.

pura vida

Yeah let me expand on my thoughts a bit.

So far we’ve focused on simply recording everything we can. This isn’t a problem, but it only covers one use-case: getting an idea of what CVEs might apply to a given snap. Another thing we’re wanting to accomplish with this is the ability to rebuild the snap from the manifest.

Having a plugin simply return all assets it wants to record for a given step works only for recording-- it doesn’t give us a way to hand that information back to the plugin in order to rebuild the snap from the manifest. Whereas if we add asset save/restore functionality to the plugin API, we get both.

Having a save/restore method in each plugin will certainly make it more flexible. However, my opinion is that we should avoid that flexibility (and complexity). Instead, we should strive very hard to make sure that all the asset recording makes up a valid snapcraft.yaml.

In the python proof of concept linked above, for example, the recorded field is python-packages, which is a valid keyword. And the plugin makes sure to return it in a format that’s also valid. This makes a restore method unnecessary, because we can just build a new snap using the annotated snapcraft.yaml, and it will installed all the python-packages recorded from the first install with no extra work.

That’s a fair point, and I currently don’t have an example of why limiting ourselves to the plugin schema would cause issues, but we should be careful. We might find ourselves adding items to the schema just to support this.

Assuming we do go this path, how would you reconcile conflicts between the assets returned by the pull and build steps? What if they both return a set of python-packages, for example?

I think python-packages should only be returned in the step that installs them, so they would be returned by the build method.
An attribute returned by pull would be the python version that was installed, for example. Because that’s needed before build can happen.
I can’t think of an attribute that should be returned in both steps, please help me here in case I’m wrong.

And this brings an interesting point. We currently can only select python2 or python3. That doesn’t tell us if it was python 3.5 or 3.6. IMO, this means that our python plugin is missing one attribute, or that the python-version attribute needs to accept more values. Instead of hiding this in a load/restore that accepts anything that happened, we should improve the plugin to let people be precise on what they require.

We decided to take the approach supported by @kyrofa. I will drop my experimental PR and start another group of PRs to record python details.

In this post we will discuss about the information to record for python snaps: