Updating bootloader assets in the gadget snap

mvo · March 30, 2017, 6:58am

Background

The gadget snaps typically contain bootloader files like e.g. bcm2709-rpi-2-b.dtb that is critical for the system to boot. Those files are put in place by ubuntu-image on image creation time, e.g. to /boot/uboot/bcm2709-rpi-2-b.dtb.

Subsequent updates of the gadget snap via snapd will not modify these files currently. Sometimes it is important to provide updates to those files and snapd should support this.

Possible options

Updating the bootloader files is very dangerous, if the power goes down in the middle of a write the device is potentially bricked. We also have no fallback (like we do for os/kernel) if its a bad firmware. So the update mechanism must at least follow the best practises for atomic writes and directory sync etc and should be used with a lot of care.

Given the risks involved the bootloader assets update should always be a manual process triggered by a human either from the command-line or via the REST api from e.g. snapweb with clear warnings.

Below some of the options we have:

Delivering updates via hooks

We could provide a way to update these files via a hook (e.g. update-bootloader) that is called when snap update-bootloader (or the responding REST api) is called. The hook would need an interface to read/write the /boot/{grub,uboot} directories and potentially even direct access to the hard-disk (if e.g. a MBR needs updating). We would also rely on the gadget snap author to follow best practises for writing those files with the risk that the implementation is sub-par.

Delivering updates via snapd

We could make snapd support a subset of the gadget.yaml spec, especially the “content” part. snapd would compare the files on disk and in the snap and when snap update-bootloader (or the coresponding REST api) is called, then it would update the files that changed following the atomic/sync-dir patterns. We still need to be very careful and ideally replace all the files at once (as there may be dependencies between them). There is also the issue that some files ship as defaults (like config.txt on the pi2) but are also user modifiable and we can not replace those once the user has touched them.

Open questions

Should we support a mechanism from the gadget to signal that bootloader files need an update and for the user to query that? If so, we can integrate it in the same way as above, either via a hook (bootloader-needs-update) or as a native snapd implementation.

zyga-snapd · March 30, 2017, 7:12am

I think the option to update via snapd needs to be considered for scalability. If there is only one language that we feel is sufficient to describe an update and snapd implements a way to apply that update then that is fine. I fear however that given our limited view of the device landscape (and various quirks associated) we should use the hook mechanism so that each device maker can code a custom solution that accurately matches the requirements of the system.

In addition there may be other concerns (e.g. system design limitations) that the user needs to be aware of firmware / bios / bootloader updates (e.g. the user needs to agree to something, there may be a warning displayed (otherwise the user may just unplug the device at any time without knowing something important was going on) or ensure a device is sufficiently charged.

I think that the following should happen:

a gadget update should be able to signal to snapd that low-level update is required
snapd should collect and surface this information (e.g. in snap CLI or via snapweb or otherwise)
once initiated (via snapd APIs / snap CLI) it would use a hook to perform the update process

This way the design scales from web-managed headless devices to personal devices (laptops, smartphones) all the way to centrally managed servers. The two key points are notificatoin and API-driven process that gives the vendor the ability to freely code the update process.

I would also put a potential tie-in between the gadget and the kernel snaps into the design document. We know for a fact that sometimes those have to be updated in certain kinds of lockstep. Since kernel and gadget cannot be changed (as in their snap-id’s) without a complete device re-flash I would suggest either a simple assumes like flag or give the hook a way to say I cannot update because kernel is too old. If a hook approach is used there should be symmetry between kernel and gadget in this regard (the hook should be possible on both kinds of snaps with the sole purpose to bail out)

ogra · March 30, 2017, 9:29am

i wouldn’t auto-update at all here but provide a snapd command that can either be issued manually or via a REST command. it should make clear to the user that there is the risk of completely bricking the device and that he should not power off the device. this command should replace all unmodified files (so it will exclude a user modified config.txt for example) under /boot/uboot|grub with the newer version. i do not think that any way of auto updating is an option but leaving the user with old bootloaders isnt either indeed, manual local or remote updates are a conscious process that i see as a good compromise here.

ogra · March 30, 2017, 10:02am

device tree files should be seen separated from bootloader binaries/config.

technically dtb files are bound to the kernel, not to the bootloader at all.

a special case here is the raspberry pi (the other devices all ship their dtb in the kernel snap and use it from there) that requires the device tree to always be available in the /boot/uboot vfat for use by the binary blob bootloader.

there is a config.txt option “device_tree=” that works fine with our setup …
i.e. http://paste.ubuntu.com/24280117/ still boots fine.

on upgrades we could easily use the pi config hook that allows to change config.txt parameters here to point to the new dtb.
but sadly we would have to roll back from uboot in case the update goes wrong which brings us back to the problem that writing from kernel and uboot to the same vfat eventually corrupts it.

zyga-snapd · March 30, 2017, 10:10am

I think that while we should strive to fix any issues affecting a specific board (e.g. Raspberry PI 2/3) the goal is to build a reliable system where people can effectively:

create a new device without patching snapd
give that device reliable rollback in case new core / kernel fails
give that device an optional way to update very early boot firmware knowing that it may brick the device in case of failure.

ogra · March 30, 2017, 10:21am

not sure why you say this, is anything above indicating we would expect anyone to patch snapd ?

exactly … this works fine for all gadget snaps that have “device-tree-origin: kernel” in their gadget.yaml but we have no mechanism at all for the ones that ship a dtb inside the gadget and use a non uboot config on top (i.e. only the pi today)

zyga-snapd · March 30, 2017, 10:23am

Because one of the proposals from @mvo was to have snapd do this. This really feels like something that puts us on a path of having to teach snapd about every potential quirk out there where the update cannot simply re-apply the files as the initial image would.

ogra · March 30, 2017, 10:34am

well, i would understand it as a command snapd ships that execs a certain script/binary/tool/hook the gadget can ship, it would be identical on all systems on the surface but the back end would be system specific.

i.e. “snap update-bootloader” which then calls hooks/bootloader-update from the gadget snap. what the bootloader-update tool is is up to the porter then.

mvo · March 30, 2017, 10:45am

Its a trade-off, this is why I put two options in there for discussion We could do it via a hook (update-bootloader is a nice name). The result is that we are out of the loop but there is a risk that vendors do a poor job on implementing this hook and people might blame us. The upside is that vendors can do whatever they want. The other option puts us into the critical path but we can control the quality of the implementation (with the cost of us having to do the work or reviewing patches).

zyga-snapd · March 30, 2017, 10:49am

I’d use a hook but put us on the critical path for now. Those hooks will still run confined and we can make a privileged interface that actually lets you do the update and only grant that to snaps that get some form of review / certification.

ogra · March 30, 2017, 10:50am

well, that wont scale at all … perhaps we could hire a specific “porter patch maintainer” for it, once we have enough boards that might become a fulltime job … i think leaving it to the porter and taking the risk of blame is the only scalable option.

niemeyer · March 30, 2017, 8:44pm

I’m not sure we can afford to require manual actions. People will have devices in the roof that they won’t want to manually do anything about, yet the device manufacturer will want to update these details. Yes, we should be extremely careful, but there’s probably no way around it.

We’ll probably need this at some point for other reasons, but for the assets that we manage and that are a standard part of the gadget format, we need to handle it. It’s also much safer to do it well once than to expect everybody to be careful every time on their own custom scripts.

That’s what we need I think.

Can’t we hook this process into the snap refresh procedure itself?

I’d prefer to engineer it in a way that updating the gadget and updating these bootloader files is perceived as the same thing on the user end. People are already surprised today that this isn’t the case, which is a great indication that this is a path of least surprise.

ogra · March 31, 2017, 2:58pm

well, would you prefer to brick the device automatically or rather have someone do it manually through a remote connection and be aware of the risk … manufacturers not supporting all builtin HW on new systems providing enablement for this additional HW via an update of the bootloader binaries is not a rare case in the embedded world.

having a human involved here who sees a massage about the possibility that the device is bricked when powering it off during the flash process will at least prevent you from silent bricking.

… or the biggest surprise if their device on the roof that uses NAND storage and has no way to recover from a broken bootloader hard-bricks itself over night.

there is no proper solution to this problem in any case, only compromises …

niemeyer · March 31, 2017, 3:03pm

@ogra There’s no option here. We will enable manufacturers to automatically update their devices, as otherwise we’re not delivering on the promise of helping people be up-to-date and out of security risks. We need to do a good job on the update procedure, and make the manufacturer aware of what risks are involved in the process.

zyga-snapd · March 31, 2017, 4:45pm

I think updating the firmware is a really special case. It’s rarely security sensitive and should not happen outside of assisted and planned scenario. Think about this in another context. Would you be comfortable as a device manufacturer if the software you need to use would write to UEFI persistent storage each time you ship an updated gadget snap? The risk taken here is mostly similar to flashing BIOS on a motherboard or doing bootloader update on a phone. Those are not casual experiences and the cost of failure is immense. I bet there will be commercial feedback on this front as soon as more people doing ARM devices (where this is more common) realize this takes place. We’re not re-installing GRUB here, we’re updating BIOS transparently.

I bet this cannot be done, for legal, warranty and manufacturer cost reasons. Let’s please reconsider what is updated, how and how much the manufacturer is in control.

lool · April 1, 2017, 12:43pm

Could we imagine delivering firmware updates via a separate (privileged) snap? For instance, we don’t ship a BIOS image in the gadget snap/images, but you might want to update your BIOS.

This thread is about updating bootloader assets already in the gadget snap / in images which is more specific; I guess we could limit the change to that, but then it wouldn’t apply to other firmware update situations.

On the actual updates, I join Zyga’s concerns that we don’t want to flash the bootloader assets every time the gadget snap is updated as that introduces more risk.

I also wonder if these updates should be done in a relatively controlled environment, e.g. while no snap provided-services are started. For instance: stop all snaps, write bootloader bits carefully, start all snaps. That’s intrusive, but that would bring the system to a quiet state with predictable performance and no interfering load while doing the critical update.

ogra · April 1, 2017, 4:02pm

probably even a special bootmode that can trigger the update hook from the initrd (similar to the bootreason variable on android that triggers recovery mode to do certain risky updates). that way you would not even have init running and could mount the root partition readonly (in case you need to copy from there). and typically you want to reboot anyway after updating such a part of the system.

niemeyer · April 3, 2017, 6:27pm

We shouldn’t be discussing how far we need to go. We’ll go as far as we need to, and no further. I’m all for not taking risks we don’t have to. As such, we can start from the real use cases we have today, which I assume is what @mvo is really interested in getting opinions on. The point I’ve raised above is that for those cases we need to design solutions which can be automated from the get go, since snapd is not just phones and desktop.

@zyga-snapd http://google.com/?q=firmware+security+issues

zyga-snapd · April 3, 2017, 6:32pm

I updated my BIOS on all my devices and each time it was to support a new CPU or to fix ACPI on Linux. The only exception I can think of is Apple releasing a iOS update to stop jailbreaking but even that required me to manually agree as it was still a risky operation.

As to the issue at hand: the suggestion to update boot assets as they are descried in gadget.yaml does bring in the bricking risk so I think we need to have a way to say “we really want to write those files / raw partitions” (perhaps other than trying to read the original.

niemeyer · April 3, 2017, 6:37pm

That’s exactly the point. You and the vast majority of people in the world only update anything at all when there are new visible benefits to be had. Meanwhile, security issues thrive.

So yes, let’s please take a simple case to handle, and then go into specifics about it.