Launchpad post-build pre-upload testing

rbasak · May 22, 2018, 9:44am

I managed to upload a broken git-ubuntu snap to the edge channel yesterday, automatically built from Launchpad. We have a test available (git-ubuntu.self-test) that, had it been run on the snap after build but before upload, would have failed, but we currently have no mechanism in place to tell Launchpad that to stop before upload.

It had passed our CI, which runs snapcraft and runs this test. The problem must have been due to a non-determinism in snapcraft or some difference in how snapcraft is run between our CI and Launchpad build environments. But my question here is really about the general CI side of this rather than the specific bug.

Could we add something to snapcraft that could allow us to automatically test the actual built snap before Launchpad uploads it? @wgrant suggested to me on IRC that if snapcraft could provide a test facility, then Launchpad could use it.

I understand there’s a separate effort to automatically revert a broken snap after a refresh on the user side. We should definitely do this too, but it seems to me that in this case we could have avoided upload of a broken snap to the store in the first place, which would be better.

kyrofa · May 22, 2018, 3:48pm

What would this functionality ideally look like, do you think? In the case of git-ubuntu, its tests need to be run on the final snap, which needs the snap needs to be installed in order to run the tests. Not all systems on which snapcraft runs can actually install snaps (e.g. docker). Not all tests would probably look like that, either. If Nextcloud were to run its suite of tests, for example, it would involve installing a bunch of gems, installing the snap, visiting it on ports 80 and 443, and putting it through some smoke tests. I figured the LP builders were locked down a bit too much for something like this. Personally, I’d rather see a good webhook system that I can tie into with my own CI and run tests when a snap is released to edge. Yes, edge may be broken, but it’s edge: it’s broken sometimes.

rbasak · May 22, 2018, 4:10pm

I understand that some tests have external dependencies and these tests won’t be able to run in all environments. We have this kind of thing with dep8 tests in the archive - even though they can access some infrastructure, they don’t have free reign to do anything. While not all tests would be able to block publication, then, tests that require no external dependencies might.

I imagine something like snapcraft test that picks up on the previous snap build (or perhaps does the build if it isn’t done already), installs the snap, runs some command provided by the snap, undoes the snap installation and exists with the command’s exit value. Special exit codes would reflect the command being unable to run due to the environment not being available (no virtualisation, or inside Docker, etc) and being unable to run due to no test command being defined.

“Install the snap” may have to be in a container or VM or something, much like snapcraft cleanbuild. “Undoes the snap installation” would then destroy the container or VM or really uninstall the snap if it was done on the host system. Exactly what environment it uses could be adjustable with options.

To determine what command to run, it could be specified in snapcraft.yaml, or use the same healthcheck I think is in planning for deployed snaps. Alternatively, if the healthcheck thing will prevent the snap from being installed, snapcraft could just return the success of that.

Then Launchpad could hook in to it just by running snapcraft test (with options to describe any environment restrictions if required) and “failing the build” if that command fails.

Yes, edge may be broken, but it’s edge: it’s broken sometimes.

True, but we don’t publish an empty snap if the build fails. I feel this is rather similar: we know at build time that the result is broken; it’s a shame to force users following the edge snap to have to revert given that we knew before upload. It’ll discourage users from following an edge snap, when it’s better for developers the more users that do.

rbasak · October 15, 2018, 10:40am

As an update, for certbot, I’m using Travis for now to build daily, run tests and publish to edge if tests passed. You can see the configuration here: https://github.com/basak/certbot-snap-build. I started with instructions on how to publish directly from Travis and modified them to add the test step.

Unfortunately this means amd64 only for now.

popey · February 13, 2020, 3:04pm

@rbasak

While I can appreciate that it would be awesome to have tests which gate releases to edge, the whole point of the edge channel is that it’s inherently more risky than stable/candidate/beta by design. If something breaks in edge, so be it. Sure, we’d rather it didn’t but that’s kinda the point of the channel, a fast-follow of github master for adventurous enthusiasts and QA alike.

Could I please ask you to reconsider blocking pushing to edge based on functionality that doesn’t exist yet in launchpad or snapcraft?

If not, there’s really no point having the certbot being built and published at all.

rbasak · February 13, 2020, 5:13pm

Perhaps where this disagreement is coming from is that you are used to snaps being an “extra” further to a separate not-snap and independent upstream development and release process, whereas I am seeking to fully integrate them into upstream processes?

Here are my requirements for full integration, with what I hope are adequate justifications for why every upstream project that wants to make snap releases would want these things:

Regular CI of the snap build and integration testing to make sure that an installed snap built from master actually works. Ideally this would also be done against every upstream PR (eg. “this PR breaks the snap build; please also fix snapcraft.yaml in your branch before landing it”). This is important for the same reasons that any project would want to have CI and always be green. It is essential to keep up with snap packaging quality; this allows snap support to be a first class citizen rather than having specific snap-focused developers always playing catch-up. Changes upstream that affect the snap will immediately be noticed and can be addressed, preferably before even landing them. Snap builds from the master branch can then be reliably expected to always work.
As much as is possible, the edge snap needs to be maintained and up-to-date against the upstream master branch. This would allow users to consume and test changes landing in upstream master immediately. This is helpful for QA, and also may be helpful for users who want yet-to-be-released features and are willing to risk stability for that. Developers might themselves also consume from edge, but probably only to test any issues with the snap itself; they tend to run directly from the source tree in most cases, so the edge channel is of limited use to developers except where it helps users to help developers.

“Inherently more risky” is hardly the point. You’ve got that on the wrong side of the equation. The risk is the accepted cost, but the acceptance of the cost doesn’t automatically invalidate any effort to reduce the cost. That would be a rather circular argument. I explained my view on the the point of the edge channel in the details of my second requirement above. Maybe you could expand on the actual point of the edge channel if you disagree?

So what benefit would more breakages in the edge channel provide? I don’t understand why you think this is OK. Increasing the number of regressions in the edge channel reduces its utility.

On the other hand, there is benefit to manual testing: this can find issues overlooked by automation. Providing a automatic-test-gated edge channel increases the utility of that channel for manual testing, because it avoids wasting manual testing effort on known broken builds.

My two requirements above are fundamentally independent. The first one does not require snaps to be published. But given that it is best practice to fulfil the first requirement, doing so results in a known good snap build, and given my argument as to why publishing known-broken snap builds to edge is a bad idea, why shouldn’t we combine the two? Travis can be configured to do this today; doing the two things separately would be obtuse.

If by this you mean that you’d prefer me to drop the CI gate in order to win multiple architecture support, I remain reluctant because I think the CI gate is more valuable for where we are at the moment. This doesn’t exclude multiple architecture support in the future, though, and I believe upstream have plans on implementing that in their own CI before releasing stable snaps. The new Travis support for ARM that you pointed me to earlier might make this much easier.

Note that I’m working with upstream on handing over maintenance of the snap over to them. Their opinion may differ from mine, so things might change after that happens.

That’s not true at all. Right now the certbot snap gets published to edge any time it actually works. That’s useful to users who now only take the risk that CI has missed something, rather than risking every other breakage that CI can already catch. And it’s useful to upstream developers as a first step to fully integrating snap-based distribution as a first class citizen in their project.