Android support in snapd

jhodapp · April 19, 2017, 6:33pm

Hey everyone,

As part of some important commercial work, I wanted to start a discussion on getting required fastboot support into snapd. @morphis, @ondra and @niemeyer as well as some other people started discussing this at the last sprint @ The Hague last autumn, and I’d like to continue this conversation here and get a finalized solution in place since this project has a very tight schedule.

So first off, let’s define exactly what this means. We need to be able to flash a core snap and a kernel snap for a 3.4-based UC16 system to a system flash device. 3.4 kernel support isn’t possible today, but we’re working on a solution in parallel for this. I can’t get more specific on the device details, but would be happy to provide any technical details in private.

@morphis and @ondra: my understanding is that you came up with a proposed design for this at the sprint. Would you mind detailing how that would work here so that we can start discussing it and get approval for a final solution. Then we can agree on who’s going to do the implementation and when it can be completed by.

Thanks!
Jim

morphis · April 20, 2017, 8:19am

There are two things we decided on at the sprint:

We will add fastboot support next to uboot and grub into snapd. A preliminary implementation for the basic boilerplate code exists here. The implementation needs to generate proper fastboot images on the fly by reimplementing necessary things from the abootimg utility. The result boot.img will then we written into the boot partition which is either defined in gadget.yaml or via a partition label.
We discussed how we would handle the fallback from not booting core/kernel snap combinations and agreed on that we implement without support for this to get the implementation forward and a first version implemented quickly. The second step which still needs to happen is coming up with a proper design for the fallback boot. As the scenario we discussed assumes we don’t have the bootloader sources and a static partition layout we can’t change, which only includes a boot and recovery partition the bootloader can boot from, this needs a good design of how we can handle proper fallbacks.

My proposal to go forward is the following:

Take my existing code and add the necessary steps to generate a boot.img on the fly and flash it to the boot partition. To boot.img generation needs the default kernel cmdline, the name of the core and kernel snap to boot and additional meta data needed for the boot.img generation process as input. Eventual it is the best way to have a pregenerated boot.img with a dummy zImage inside the kernel snap which can then be just extended with the correct additional arguments for cmdline and the right zImage and ramdisk.
Find a proper place to define the boot partition either via gadget.yaml or via a partition label

In parallel we can look into possible ways of how to handle the fallback boot but that needs more insight of how the actual device works with handling recovery/boot partitions as there is no common way of how Android bootloaders handle this.

ogra · April 20, 2017, 9:50am

well, effectively we do not need to create it but only have code that modifies the cmdline. i think for a first implementation we can simply say “the user has to provide a pre-made boot.img the abootimg tool can handle”. after all we do not need build it, but only to update it with updated content.

morphis · April 20, 2017, 9:58am

We can go that way. However I would prefer if we implement the small amount of code we need for this in snapd rather than relying on a custom third party tool.

ogra · April 20, 2017, 10:05am

so you want to have a complete re-write of abootimg inside “prepare-image” ? i guess that could work but is a non-neglectable amount of extra work (note that abootimg will only support a fraction of boot.img files though, i.e. mediatek adds some custom header to all its boot.img files)

morphis · April 20, 2017, 10:38am

snap prepare-image is another thing. I was more thinking about the runtime case where snapd fetches a new kernel/os snap and needs to update the boot.img. For the image construction we could use abootimg and I would recommend that this is a more long term problem and we continue for the specific project in question with the scripts I wrote some months ago to contruct necessary .img files for the writable and boot partitions which can be flashed individually. Long term we need to add support to ubuntu-image for this too.

ondra · April 20, 2017, 11:29am

sorry for jumping late into conversation.
reading through, I’d say:
no need for kernel snap to contain ready boot.img it should contain kernel and ramdisk images. Gadget snap should contain recipe how to build boot image out of it. My reason for this is that on some platforms (mediatek) boot.img and recovery.img are not interchangeable, where difference is only in one flag when assembling image. Also if we need to attache extra command line parameters this would be done once we know them at assembly time. Allows us sharing same kernel on different devices if they only differ in kernel commands….

I’d agree that snap-prepare should be included as soon as we can, as we will need some way to build images as well, and ubuntu-image already relies on snap-prepare there so it actually can save some additional work on build target image side.

Additional idea, would be to re-evaluate need for core snap revision control to be included in kernel command line and controlled from uboot/grub/(fastboot). It’s adding complexity and I think we can equally handle whole thing from initrd.

For fallback between boot/recovery images:
My idea was that when we install new kernel snap, we first build recovery image and flash it to recovery partition ( as we know boot partition path is working fine), mark flag we are “testing” and test reboot to “recovery” mode. If recovery boot succeeds snapd will erase “testing” flag, flash (repackage if needed) boot.img to recovery partition to maintain working fallback option, and recovery image will be burned(repackage if needed) to boot partition. This way we get auto fallback for free. Recovery boot is done with non persistent recovery flag through android bootloader, so if it fails, next boot will automatically fallback to boot partition, where we check for “testing” flag, if it’s still there we know recovery boot went sideways and kernel snap should be marked as rotten.

And one more thing, which I still need to think more in details.
This all applies to platforms supporting pre Android 7, since Android 7 we will have lot better support from boot loaders, as it supports two boot partition for transactional updates added in Android 7

ogra · April 20, 2017, 12:01pm

i’m not sure we actually want to ever touch recovery. after all this will be your last resort when everything broke to still get to some emergency system. instead recovery should contain the automated fallback logic to revert boot.img in case we panic…

i.e. you upgrade which generates and dd’s a new boot.img in place, reboot, on panic you auto-reboot into recovery which has the automatic rollback scripts to flash the former boot.img into place from a backup we keep in /boot/fastboot. in case either of the flashing goes wrong you will always have your emergency access via recovery available.

ogra · April 20, 2017, 1:22pm

btw, could we rename the topic to “android booting in snapd” i dont think anything in here is necessarily tied to the fastboot protocol.

ondra · April 20, 2017, 3:30pm

considering we can’t modify bootloader under us, how would you handle case when we update to bad kernel and flash it to boot partition. So device is in state when it can’t get pass the kernel when booting from boot partition. Which component should tell bootloader to eventually try recovery mode? I’m sure there is some mechanism doing this, but considering we have no access to bootloader code, can we rely it’s working on all Android platforms same way?
Altering boot and recovery partition will put this into our hands. Beside we probably want to keep updating recovery to previous working kernel, rather than keeping recovery as some special case, I’d imagine….
True in my case there is stage when we are updating both partitions at the same time

niemeyer · April 20, 2017, 8:04pm

Sounds good!

This already exists in the gadget.yaml definition of the volumes.

Note that strictly speaking grub and uboot don’t support that either. It’s the logic we use in their scripting language that enables them to do that.

The gadget can’t update the kernel’s boot image like that. At least not right now. I suggest doing something similar to what we have first, and then looking at improving these details later.

It’s exactly the opposite: using the cmdline is a trivial change on both of the bootloaders we handle today, while generating a new initrd image every time we need to update a snap is much more complex and risky.

jhodapp · April 20, 2017, 8:13pm

Hey guys, thanks for your thoughts so far. I had to do a bit of research before starting to have a better idea of what your proposals are referring to. So we have two main things to take into account after reading through this thread:

Flashing of the fastboot images in a manner that is compatible with what abootimg does from snapd
Handling how to recover from a device that isn’t booting correctly and how to generate, flash and utilize that recovery image

Both of these need doing for the project, but the first one is the most important and timely one that needs to be implemented in snapd. That being said, just to make sure I have a high level understanding of how this is supposed to work:

A new version of the set of images that make of the fastboot image are released
snapd downloads the core, kernel and gadget snaps for the system
snapd extracts these snaps and prepares them into a flashable set of .img binary files (or is it a monolithic image?) that are the bootloader, the kernel and the rootfs (anything else I’m missing for a fastboot situation?)
snapd will do the actual writing of these images or monolithic image to predefined addresses on the flash

Is this generally correct?

Simon’s proposal seems reasonable and pretty straightforward. We need to do the simplest implementation needed to support this particular use case first but still general enough to build on for future devices without needing to scrap the entire approach. Is Simon’s proposal generic enough given what I just mentioned? What are the next steps?

ogra · April 21, 2017, 9:01am

we do not have any way to assemble an android compatible boot.img in this definition today, this requires special tools and “snap prepare-image”/ubuntu-image need to know about them (which is why i’m suggesting to start with a prerpared boot.img inside the kernel snap for now)

in case of android bootloaders we do not have this script language opportunity so the logic needs to live elsewhere outside. one way here is to use the recovery.img boot like outlined above, make it a requirement to make the kernel hard-reboot into recovery on panic (assuming that when we have android bootloaders there will also be android kernel source with the necessary feature enabled) and have it flash the boot.img back and forth as needed. while there is no actual standardization in android bootloaders, the kernel reboot logic should at least be adjustable to support the specific way a certain device bootloader handles this and we have access to the source.

sadly it is equally non-trivial to change the cmdline than replacing kernel or initrd in the android boot.img, in either case we need to write to something inside a raw img file (preferably even directly to the partition) and on a case by case base even need to re-pack the boot.img to make the change (no standards here either). but this does indeed not justify to drop the current implementation, especially since the commandline variables are used in way more places than the bootloader.

ogra · April 21, 2017, 9:30am

i find it still to complex for the start, we should make the generation of the boot.img file part of the kernel snap build and have this file readily made available, so all we need in snapd is:

knowledge where to put the boot.img file on kernel upgrades (from the gadget data) and dd-like support to write it.
snapd support for abootimg to update the cmdline in place (trivial to do when just shelling out to abootimg for the start)

your point 3 above (multi img output) should already be supported in ubuntu-image and the gadget spec.

regarding point 4. i dont think we want snapd to flash anything but the boot.img. initial flashing of an img to the device should happen traditionally via “fastboot flash”, the writable img will simply have a normal ext4 filesystem that carries snaps we update and be written once to a partition (presumably the userdata one). from then on upgrades will simply replace snaps inside it, only the boot.img will ever be re-flashed (probably not even that, abootimg is prefectly capable of only replacing the vmlinuz file directly on the boot partition as long as we make the initial size parameter of the boot.img file big enough).

ogra · April 21, 2017, 10:00am

Let me sum up what i have in mind:

kernel snaps always ship a ready made boot.img
the kernel panic function has to hardcode “reboot recovery” so on panic we always land in recovery mode.
on kernel upgrades snapd writes the cmdline (including the snap_try variable) into the boot.img file and calls “reboot recovery”
the recovery mode checks if there is a newer boot.img file containing “snap_try” in the cmdline, additionally sets the “snap_trying” variable, flashes it and reboots.
on successfull boot snapd unsets the two vars
if the boot fails we’ll return to recovery but will find both vars set in the boot partition. is that the case, flash back the known good former boot.img

this re-sembles the behaviour of the current uboot/grub scripts but moves all the logic into recovery and cmdline. 90% of abootimg handling will live in the recovery.img and the changes to snapd can be minimal.

i’d also suggest that we do not really change the recovery.img anymore after initial setup. it will give us the guarantee to always have a bootable emergency mode (like we can always go to a uboot or grub prompt in the other scenarios)

morphis · April 21, 2017, 11:10am

@ogra Thanks for writing that up! Had a similar setup in mind. The only thing I don’t want to see is that we add the abootimg utility directly into the core snap. I think we should be able to manage changing the cmdline with a little amount of Go code.

ogra · April 21, 2017, 11:57am

i think adding abootimg to all arm core snaps isnt really a biggie for the start, we can always replace it later (we did the same with grub (via grub-common) and uboot (via fw_setenv) before we had grub.go and uboot.go in place)

(indeed i dont want to block anyone to provide patches to add abootimg.go, but i dont think its is a super urgent thing to have to get started)

jhodapp · April 21, 2017, 3:40pm

@ogra Your proposal seems like a really good one. As long as we’re set to do the initial flashing (which is done via traditional fastboot flash), upgrading of the boot.img as you mentioned, and a reliable recovery image always in place (I agree that we should put one in place and then not touch it), then this should meet the requirements for this project. Thanks for your prompt and thorough replies on this.

So now the question is, who can do the implementation who has knowledge of snapd and this process and how soon can it be started and about how long will it take to complete?

ondra · April 24, 2017, 11:33am

@ogra I still don’t think having boot.img in kernel snap is really needed, as it can’t be used as it is anyway. Instead having initrg image + kernel image should be enough there if we know how to assemble boot.img from it.
Adding abootimg to core snap +1. How are we going to handle special cases like Mediaktek when we need to add custom flag when building boot image?
Do you aim to eventually support all those cases in abootimg.go and gadget snap only describing which “special case” to use for given device? I can image this being workable. May be a bit cumbersome if you bringing up some special case device requires also MP to snapd to add the case….

So from your proposal, you actually want to flash test boot image from recovery boot? Isn’t this a bit overcomplicated? Why not to flash boot image already from snapd from normal boot? To be aligned with how we do it on other systems. Just wondering what is benefit of this extra boot to recovery step.
Also what would happen, in unlikely event, when kernel image is completely broken, so we can’t rely on kernel panic function to bring us back to recovery?

ogra · April 24, 2017, 12:25pm

well, you are giving the reasoning yourself above why we should ship it in the kernel snap

there might be 100 different tools needed to cover all possible ways of mangling a boot.img so it boots on a certain platform like your mediatek example … we do not want to include 100 different tools in teh core snap nor do we want to have snapd know about 100 different ways to manage them. that is why i want us to start with the easier step that you ship a pre-made boot.img in a snap that is HW specific (kernel). wer can surely look later into on-the-fly generation but that will get complex very fast so lets start off with teh easiest implementation possible.

no, it actually makes everything easier since we:

have no way to have the rollback scripting anywhere in the bootloader, so it should live in the recovery mode (note that we need the snap_try mode for core snap updates as well)
only need to handle the “reboot recovery” in a kernel patch
have a proper “bootloader shell” like we do on grub/uboot through the recovery mode (simply via an initrd in recovery)