2.28 release cycle retrospect


#1

Intro

This is a retrospect for the 2.28 release cycle. The 2.28 release cycle was similar in length to the 2.27 cycle. However the goal was to have a smoother cycle and this goal was not archived. None of the issues in 2.27 re-occurred, i.e. the building/running/testing on ubuntu releases went smooth this time. The root cause of most of the issues in 2.28 can be traced to a wide-ranging change in the security profile support in 2.28. We apply the udev tagging much broader now. However this caused issue mostly with the proprietary nvidia driver.

The timeline below shows some details:

Timeline

  • 2017-09-25 Release 2.28 into the “beta” channel

  • 2017-09-27 Release 2.28.1. The reason for this release was to rename the new “refresh” hook into “post-refresh” to ensure consistent naming and to ensure we can have a “pre-refresh” hook in the future.

  • 2017-09-28 Release 2.28.1 moves into the “candidate” channel

  • 2017-10-10 Release 2.28.1 moves into the “stable” channel after our internal QA and CE QA validated it and after all testers using candidate used it for some time.

  • 2017-10-10 Bugreport from a user about a typo in udev auto-generated content for the network-control plug (introduced here). Response: Test for invalid udev lines everywhere PR#4052.

  • 2017-10-10 Pushed new 2.28.2 to beta with the fix for network-control

  • 2017-10-11 Bugreport from the lxd team that the lxd-demo-server stopped working. The PR 4004 fixes the issue. Response: extend lxd spread test to also cover the lxd-demo-server (PR#4020). Also encourage the LXD team to run core from the candidate channel to detect issues early.

  • 2017-10-11 Push 2.28.3 with the lxd interface fix to the beta channel.

  • 2017-10-11 Bugreport from users in the forum about udev_enumerate_scan failing and opengl stopped working. This affects users with the nvidia proprietary driver. The symptoms are different on xenial, zesty, artful but its all due to the new udev tagging which does not work for the nvidia driver because it cannot interface with sysfs. The first fix (PR#4022) was pulled in. Response: hard to automatically test. Encourage the desktop team to use the “candidate” channel for their core snap. Ensure in the snapd team at least one person uses the proprietary nvidia driver on a daily basis.

  • 2017-10-11 Bugreport that the new build-in xdg-open does not work in the re-exec case. Not strictly a regression but we fixed it right away as this is an important feature (PR#4034). Reponse: improve the existing spread test to take the re-exec case into account.

  • 2017-10-11 Release 2.28.4 with nvidia fix and a small packaging fix.

  • 2017-10-12 Bugreport from users about failures of opengl snaps on recent nvidia versions even with 2.28.4. After debugging it turns out, odern drivers have extra device nodes that the udev tagging code needs to take into account. Fixed with PR#4033.

  • 2017-10-12 Bureport from 16.04 users about udev_enumerate_scan failure not fixed until the machine is rebooted. This is unacceptable so we added cleanup code into snap-confine to cleanup the state that caused libudev to be erroring (releated to the udev device tagging that was incorrectly applied to nvidia devices) - PR#4042.

  • 2017-10-12 Bugreport about incorrect tun rules in network-control. Fixed in PR#4031. This requires the refresh of udev rules on startup to ensure incorrect tun rues are fixed (PR#4037)

  • 2017-10-13 We pulled in a trivial (non-regression) bugfix for valid snap names like 0ad that were incorrectly rejected (PR#4043).

  • 2017-10-13 2.28.5 with the above fixes released to the beta channel.


#2

It’s also worth noting that the policy documented in Policy for releases of the core snap wasn’t followed (store team were not notified of the release in advance)


#3

What we learned

  1. To verify changes that are hard to completely test automatically we wlll add a lightweight process to get more real-world test. The basic idea is to add a new github label “unverified” that can be added to a branch that needs testing on special hardware. We will merge such branches but require testing once the new core hits the edge channel. Once we got feedback from the testers we remove the “unverified” label. We will not release a new core if there a PRs that have the label “unverified”. If we don’t get testing feedback for the specific hardware/interface we will revert the PR again before the release.

  2. Promote the usage of “candidate” more. We need more people running candidate to get exposure from a wide range of hardware/distro as possible to catch the unexpected issues before they hit stable. Please run: snap refresh --candidate core !


#4

I am sorry, this was an oversight :frowning:


#5

My core got refreshed from 2.28.5 (3212) to 2.28.5 (3247). What was the reason for that revision bump (and why was it not a point release)?


#6

Probably a security fix for the recent WiFi issues. Snapd is the same as before in that case


#7

@Ads20000 What @zyga said is correct, the new revision just contains the updated wpasupplicant to fix the krack wifi CVEs.