Asset recording for a built snap

It has been requested often that once a snap is in the store, in the case of open source snaps, there is no easy way to rebuild that snap. During different conversations with the security team, it was also mentioned that it is not as easy to track CVEs that would concern the owners of such snaps.

For a while now we’ve been working on the plumbing layers for richer syntax for stage-packages, build-packages and source entries.

Additionally, during the hooks implementation workflow implementation in snapcraft we came to the agreement of snapcraft owning a directory withing the snap under snap and snapcraft projects would load their configuration from snap/snapcraft.yaml as a default.

Add all these things up and the proposal is to:

  • record all assets used in every part
  • recreate the snapcraft.yaml with annotated assets in snap/snapcraft.yaml

What does this allow is for someone to download the snap and:

  • extract the snap directory from the snap to rebuild
  • track versions of assets used (for whatever reasons) along the lifetime of the snap

The answer to why to go with the format of a snapcraft.yaml for this recording task is that it is already understood by anyone already working on snapcraft and (given the environment) a way to rebuild.

So how does this look? Well here is a first pass for it (of an existing implementation):

name: godd
version: 1.0
summary: Simple dd like tool
description: Written in go with support for device auto-detection via libgudev, you
  would need to use hw-assign to access devices.
grade: stable
architectures: [amd64]
confinement: strict
apps:
  godd:
    command: command-godd.wrapper
    plugs: [mount-observe]
parts:
  godd:
    build-packages: ['gcc=4:6.3.0-2ubuntu1', 'libgudev-1.0-dev=1:230-3']
    go-importpath: github.com/mvo5/godd
    plugin: go
    prime: []
    snap: []
    source: https://github.com/mvo5/godd
    source-branch: ''
    source-checksum: ''
    source-commit: 5c75ebcdca9d17cec693af13a7adebdccb8ae639
    source-source: https://github.com/mvo5/godd
    source-tag: ''
    source-type: git
    stage: []
    stage-packages: []

There is some cleanup work here:

  • we need to move to OrderedDict so the yaml looks nice.
  • (maybe) remove empty entries, they are mostly noise.
  • fix the command entry to be transparent.
  • have a keyword to identify the remote when using local sources.

With that said, a final implementation, from something like:

name: godd
version: 1.0
summary: Simple dd like tool
description: |
 Written in go with support for device auto-detection via libgudev,
 you would need to use hw-assign to access devices.
confinement: strict

apps:
  godd:
    command: bin/godd
    plugs: [mount-observe]

parts:
  godd:
    source: https://github.com/mvo5/godd
    source-type: git
    plugin: go
    go-importpath: github.com/mvo5/godd
    build-packages: [gcc, libgudev-1.0-dev]

You will get

name: godd
architectures: [amd64]
version: 1.0
summary: Simple dd like tool
description: |
  Written in go with support for device auto-detection via libgudev, you
  would need to use hw-assign to access devices.
grade: stable
confinement: strict
apps:
  godd:
    command: godd
    plugs: [mount-observe]
parts:
  godd:
    source: https://github.com/mvo5/godd
    source-type: git
    source-commit: 5c75ebcdca9d17cec693af13a7adebdccb8ae639
    plugin: go 
    go-importpath: github.com/mvo5/godd
    build-packages: ['gcc=4:6.3.0-2ubuntu1', 'libgudev-1.0-dev=1:230-3']
5 Likes

Here’s a simple PR that just takes the used snapcraft.yaml values and puts them in prime/snap/snapcraft.yaml. This means that it’s un-annotated.

https://github.com/snapcore/snapcraft/pull/1278

I started from Joe’s code but instead of putting the responsibility of recording the snapcraft in meta.py, I’ve left it in lifecycle.py for now. Later we can move it to recording.py or something like that.

I prefer to add annotations in small branches, one per step.

I added some comments on the PR, the most general one to be pointed out here since you brought it up :wink: is that we already have code in meta to deal with snapcraft.yaml.

We used to have a directory called snap. Then we renamed it to prime but the variable names were not updated. This makes the code really confusing now that we want to introduce the prime/snap directory. This cleans things up a little:

https://github.com/snapcore/snapcraft/pull/1279

1 Like

One more little refactor, to clean up the integration tests that use stage and build packages, with and without versions.

https://github.com/snapcore/snapcraft/pull/1290

I made a mistake in two tests, assuming that the arch was amd64. This one should let the tests run in any arch:

https://github.com/snapcore/snapcraft/pull/1292

If you’re going for full reproducibility, each plugin will need the opportunity to record extra asset records, including:

  • for Go, extra packages pulled in by “go get”
  • for Python, any packages brought in by pip

In both cases, dependencies can be expanded recursively.

And when system libraries are copied into the snap from the build system, it would be good to record which package+version they belonged to. There’s no guarantee that they come from something listed in build-packages.

1 Like

Thanks @jamesh. The idea is to keep adding stuff until we have reproducible builds.

We can easily extend each plugin to let it record whatever it wants. We will just have to figure out what to do in case of conflict between the versions in requirements.txt and what we recorded in python-packages in the yaml. Plugins are not part of this sprint, but it’s good to start talking about it.

Here’s another piece, that adds stage packages with version, and their dependencies:

https://github.com/snapcore/snapcraft/pull/1293

1 Like

I think that we’ll want to expand this list of build-packages to include the list of dependencies that were also installed. I could see a scenario where the snapcraft.yaml doesn’t explicitly include all of the build-packages that are truly necessary due to those packages being installed as dependencies.

Yes, sorry for not being clear on my previous reply. We have that. Take a look at this test: https://github.com/snapcore/snapcraft/pull/1293/files#diff-ba153ec4c53e9d78944f9e10f05b8caeR174

The snapcraft.yaml declares the hello package, but what we record also includes the undeclared dependency gcc-6-base.

This one adds the recording of build packages:

https://github.com/snapcore/snapcraft/pull/1295

But actually, I found a bug on the pull tracking. It doesn’t save the dependencies of build packages. So what I said before about saving all the dependencies works only for stage packages now: https://bugs.launchpad.net/snapcraft/+bug/1688151

I’m trying to fix that bug.

1 Like

Consider recording the dependencies of build-essential or the equivalent for snap builds as well as those packages explicitly listed in stage-packages and build-packages.

If you haven’t already, you should probably look at dpkg-genbuildinfo as prior art here; as well as build-dependencies, it also records things like a subset of interesting environment variables. Quite a lot of work has gone into that from reproducible-builds folks already.

1 Like

And this should fix that bug:

https://github.com/snapcore/snapcraft/pull/1299

It comes with an integration test to check that undeclared build dependencies are recorded.

Thanks @cjwatson!
I found it here: https://alioth.debian.org/scm/browser.php?group_id=30261
We’ll be checking it.

Here is the addition of global build packages to the recorded yaml:

https://github.com/snapcore/snapcraft/pull/1306

Another small fix, because the keys we were recording during pull had the wrong name.

https://github.com/snapcore/snapcraft/pull/1312

And this is the last piece, to finish copying the pull state to prime/snap/snapcraft.yaml

https://github.com/snapcore/snapcraft/pull/1317

Next, cleanups, refactors, record fancier information during pull…

During the 2.30 release and afterwards, I will be testing the build from this recorded snapcraft.yaml to identify non-reproducible builds and missing pieces of information. I would appreciate if everybody could look at their prime/snap/snapcraft.yaml too, and let us know if they see something weird.

1 Like

By pure luck, or extreme foresight, one of our test snaps caught a problem on the state tracking of build packages. Not something new, but it got worse because we now collect all the dependencies.

Here are the fixes:

https://github.com/snapcore/snapcraft/pull/1322

https://github.com/snapcore/snapcraft/pull/1323

1 Like

I introduced a regression, because in some cases this will try to install packages that are not in the archive anymore:

Here is a quick fix:

However, this got us thinking and discussing a lot about build-packages, because the way we handle their state is not nice. The biggest problem is that we first install using apt-get, and later we save the packages, versions and their dependencies. It would be a lot better if we could save the packages right after they were installed. This, however, requires a bigger refactor, that I have started today.

We also agreed on a couple of simplifications:

  • We will only save the build-packages installed by snapcraft. If there is a dependency that was preinstalled in the system, it will not be saved in the state. This means that the recording will be totally accurate only in cleanbuild.
  • We will save all the build-packages before the parts are pulled. This means that build-packages are no longer saved per-part, they will all be saved as global packages. Here we are discussing the location to save those global build packages:
1 Like

Here is a prerequisite of the refactor, to make the tests that depend on the cache use a better fake:

https://github.com/snapcore/snapcraft/pull/1334

1 Like