Proposal: Vendoring hosts in snapcraft.yaml

kalikiana · March 29, 2018, 9:26am

Background

Snapcraft gives you a lot of freedom in specifying the dependencies needed to build a snap in various ways, including Debian packages installed on the host, sources for parts that don’t reside in the snap and package managers specific to plugins such as PyPI or npm. When maintaining a stable snap, or any software, long-term it is often desirable to have a way to lock down your external dependencies so that there won’t be any surprises when you need to release a bug fix or patch a security issue. To achieve this sources can be pulled into the same repository or dedicated branches. This is what’s typically called vendoring.

This proposal will focus on the aspect of not pulling in external dependencies from different places that may not be under your control.

Proposal

Snapcraft will recognize a list of allowed host names specified in snapcraft.yaml via the new vendoring keyword that dependencies can be pulled in from. When building in a LXD container a proxy will be automatically setup, effectively creating a whitelist based on the given hosts. During the build only external resources from that list can be accessed. If anything else is used the build will fail.

This includes packages from the archive, servers such as Launchpad or GitHub, pulling in archives from arbitrary hosts as well as snaps. The list is explicit. The implementation won’t apply any defaults that might change in the future.

Note that when building natively on the host Snapcraft can’t enforce vendoring. Instead a warning will be logged informing you of this, continuing as normal.

Examples

The new keyword is added to the root of snapcraft.yaml:

name: my-snap
version: 1.0
vendoring:
- github.com
- archive.ubuntu.com
- security.ubuntu.com
- api.snapcraft.io
- 068ed04f23.site.internapcdn.net

parts:
    example:
        source: https://github.com/foo/bar
        source-type: git
    build-packages:
        - hello

In this example snapcraft cleanbuild is able to pull sources from GitHub (github.com) as required for the example part, packages from the archive (archive.ubuntu.com, security.ubuntu.com) and snaps can be downloaded (api.snapcraft.io, 068ed04f23.site.internapcdn.net).

If this snap should be able to use a package from a git repository on Launchpad, it could be amended by adding git.launchpad.net to the list of allowed hosts.

cratliff · April 3, 2018, 3:56pm

This looks great. There have been a couple times an update of an upstream dependency has broken the build and required immediate triage. Being able to have control over those would be great.

As you mentioned supporting older software is really important. In the case of having a user on a LTS branch, if a bug is found and fixed it would be very useful to be able to reproduce that with only that bugfix and not being required to update everything to be compatible with the most recent dependencies.

This may be what you’re planning with the warnings, but if you do a build that would pull in something from a non vendored source it would be useful if the warning included what was being pulled in from where so it can be fixed easier.

kalikiana · April 9, 2018, 8:33am

Implementation here: snapcraft#2042

kalikiana · April 9, 2018, 8:38am

The build will actually fail in the case where vendoring is enforced while building in a LXD container. So if something is pulling in a source not on the list you will see an error message.

niemeyer · April 9, 2018, 12:13pm

@kalikiana It’s not clear what is the actual purpose of feature. From the provided background it sounds a bit like it’s about securing it to trusted hosts? But trusting github.com or git.launchpad.net is not very effective in that sense. What am I missing?

Conan_Kudo · April 9, 2018, 12:47pm

This vendoring mechanism wouldn’t work very well for when Snapcraft is extended to RPM based distributions. All RPM-based distributions use some form of mirror selection mechanism (either through forced redirection, or through metalink/mirrorlist selection), and this would completely break that.

I’m not entirely sure what the value is in vendoring by hostnames, unless you’re considering the (frankly garbage) mechanism in which Go dependencies are fetched. This also requires too much effort to properly constrain, and would necessarily mandate that all repos/sources be fully defined in the yaml as well.

So, I’d probably say this proposal is a bit misguided, actually.

kalikiana · April 9, 2018, 1:58pm

It’s worth keeping in mind that this is just one aspect of vendoring. A complete approach of course includes branching off your dependencies appropriately where for example your Python snap uses dependencies on GitHub and PyPI and vendored branches live on Launchpad. So having git.launchpad.net on the list but not the other ones allows you to easily spot that your dependencies don’t “leak”. And you can easily decide if for instance PPA’s can be pulled in because those will depend on ppa.launchpad.net is in the list of allowed hosts. You may still have unvendored sources from Launchpad in this scenario, so this approach can’t protect you on that level. But it has the advantage of being robust. Regardless of how these resources are being pulled in, be it source:, requirements.txt, curl in a scriptlet or something else, you can rely on where your software is coming from.

cratliff · April 9, 2018, 2:19pm

I would use it with the intent of preventing issues like this: Update of ROS packages prevents snaps with ROS from running in core image Having an upstream change that leaves multiple people unable to build an operable snap is something I want to avoid.

Even with specifying versions, where versions can disappear or still be modified it still might happen that upstream changes break your deliverable. Being able to lock down where all of our code and packages is fetched from would allow us to prevent this. The catkin plugin add the ros repository on the code level, so modifying our sources.list and /etc/hosts together would work, but this would be a pain on develops instead of having a supported solution.

lucyllewy · April 9, 2018, 2:32pm

I fail to see how specifying a whitelist of hostnames is going to prevent upstream from shifting under your feet. If you’re concerned about such an occurrance then you don’t use the upstream files directly from upstream. This whole concept of whitelisting hostnames makes no sense from the standpoint that we’re “vendoring” dependencies by adding this.

If you don’t want to use a host for downloading files from, how about you specify a different host?? There is zero requirement to form a whitelist, which you control, to prevent you from downloading a file from a configured hostname, the configuration which you control.

cratliff · April 9, 2018, 2:44pm

No, it seems like you got it. I’d set up my own host that I can control. Then use this feature to ensure I’m only using my host.

Some hosts are specified in snapcraft. You’re right that this isn’t impossible to do otherwise by making sure those hosts are unreachable, but it would be nice to have this be part of the build as opposed to modifying my network configuration manually to build a snap.

niemeyer · April 9, 2018, 4:31pm

Okay, perhaps I understand what this is trying to solve now.

-1 on the current proposal. Parts already have an explicit way to define where code is taken from, and local sources are supported. If a part wants to put content in a local directory, just use a local source location.

We can also teach snapcraft to “lock down” content to a particular revision, but that’s not the same as copying every source code locally, and it’s also not the same as having a list of “hosts” (which projects are under the github.com host?).