Android support in snapd

Indeed boot time is not big issue with chain loading.
It’s more what do we chain load? Finding working uboot might be challenge for some Android platforms.
Android devices do use dtb, and that’s where it gets complicated some devices have dtb packed in boot.img some have it separate partition. So again more trouble how to handle when dtb is residing in different partition. If I remember correctly we saw Android devices which had dtb partition used by boot and recovery, or even devices where kernel had own partition reused by boot and recovery. System LSI usually does those weird setups. I’d be in favour of not supporting those special cases for now.

Hopefully in short time we will not need any of this as we will get better support from android bootloader with Android 7 adoptions. Then we will have two boot images to alternate + recovery

1 Like

We can’t predict which bootloader is used on those devices. aboot, lk or u-boot, all are possible variants which have implementations of the fastboot protocol. The only thing all have in common is the protocol. I am fine giving this a different name but we should keep the actual bootloader names out of the picture.

We are not using the fastboot protocol at all here, in any case. That would happen only if we were flashing externally the device.

I like aboot now more than lk, as a short name for “android style boot”, as what we really require for this design is not a concrete bootloader, but having boot/recovery/system partitions and some android kernel patches.

True and that is why I am fine with giving this a different name than “fastboot”.

That is problematic. aboot (very first Android bootloader used on the G1) refers to specific bootloader implementation and lk (which most vendors fork and base their bootloader on) does too. Both names are with that not applicable for a common implementation of an Android style boot, altough http://newandroidbook.com/Articles/aboot.html refers to a lk based implementation of aboot. Also there are other implementations like from Samsung. This makes it really hard for us to figure a good name. Most common things are “boot.img” and “android”. The boot.img is what we process and then dd to specific partition. What about simply calling it “android” or “android-boot”?

Let’s expand aboot to android-boot. Probably confusion with something else would not be such a big deal, but it will be good to be more descriptive.

3 Likes

I have started to implement the snapd changes for this (starting from initial code from Simon). We need two things there:

  1. A new bootloader. I have chosen to have a very simple configuration file that contains the variables (snap_mode, snap_try_{kernel,core}, snap_{kernel,core}. This configuration file will be written in a folder where the recovery partition is mounted, so it can be accessed after booting to recovery (similar to u-boot way). There is no need to modify boot.img from Core, and we can include abootimg binary only in recovery.
  2. A way to reboot to recovery when kernel or core are refreshed. This is done in Android by setting “recovery” as argument to the reboot syscall. That can be done from the command line. In Touch, after an OTA, the system was rebooted with command “/sbin/reboot -f recovery”, which immediately rebooted the system. What snapd does is running “shutdown +10 -r”. I think an argument can be added there too. But @pedronis said this was going to be changed. In what way?
1 Like

mostly moved to a different place in daemon.go, anyway given that this involve a new booloader what we could do from that new place is invoke a Reboot method to add to the bootloader interface

2 Likes

we will also need:

  • a patch for the kernel to go to recovery in all panic situations
  • porting of the update/rollback script logic to recovery (preferably by just having additional scripts in initramfs-tools-ubuntu-core and generating an additional recovery initrd during initrd creation that a recovery.img can consume)
2 Likes

This idea came to me this weekend (thanks @ogra for the discussion on Friday):

Why instead of considering the boot partition where the “normal” boot kernel resides, and recovery partition where we reboot when we have pending upgrades, we always reboot to recovery from the boot partition initramfs scripts? So we always follow this sequence, either in power on or in reboot:

boot → recovery → userdata

The boot partition would contain the scripts for upgrading kernel/core snaps, and will run them if needed. Regardless of whether upgrading or not, it will do a “reboot recovery”.

The recovery partition would do the normal boot process, starting systemd init in userdata partition.

To all effects, boot partition would be a second stage bootloader, and would have exactly the same functionality we have for u-boot and grub. Cases like refreshing the core snap, then powering off instead of rebooting the device, and then powering on, would work smoothly as in uboot/grub.

When upgrading the kernel snap, the partition to refresh would be recovery, and we would not touch the boot partition.

The drawback all this has is obviously a longer boot time, as we are going through bootloader+kernel load+start to initramfs twice. This takes around 30 additional seconds. It is noticeable, but we can assume this in this sort of IoT devices that are not rebooting that often.

The advantage is a sane upgrade process, fulfilling snap promises :wink:

Wdyt?

3 Likes

this is a really beautiful idea and goes hand in hand with how i imagined to always boot to recovery … switching the partitions instead is indeed brilliant and saves a lot of code changes.

while the additional reboot indeed costs time i think we can get far below the 30sec here, the kernel we use as second stage bootloader only needs to have enough enabled to find the other partition. it can be very tiny, the same goes for the initrd script that holds the rollback logic … my gut feeling is that we could come out below 10sec with this.

Adding support to klibc’s reboot command done: https://bugs.launchpad.net/ubuntu/+source/klibc/+bug/1692494

Alright, I’d like to understand this a bit better from Alfonso. I’ll hold a separate live conversation so I can learn more details. I’m good with the overall concept but I do think it’s important to keep the boot/reboot time as small as possible. There are still many use cases where certain IoT boards won’t be battery backed-up and so on a brown out or a short power loss, it’d be very nice for the time to be as short as possible so that service interruption isn’t very noticeable.

Ok, I better understand this now and am on board with it.

This sounds pretty interesting. It’s slightly surprising to get 30 additional seconds just because we’re going through another kernel, though. Why is taking so long to simply chainload into another kernel?

this really depends on the type of boootloader in use and how much time it takes before loading the boot.img partition, we can definitely optimize the whole kernel/initrd side down to a few seconds but a reboot includes bits we dont control (the bootloader itself), that adds to the process.

with this setup you simply add one extra reboot call every time you boot.

the robustness of the functionality gained by using such a simplified setup pays off though and is worth every extra second it adds IMHO.

It depends on how much extra seconds it adds, actually. The more seconds, the less interesting it gets, up to the point where we can’t do it anymore. That’s why we need to get more precise figures before moving on with it.

One of the things that was happening is that initramfs debug output was enabled and was being sent through the serial port, which slowed things a bit. Just removing that gets us to 20 seconds.

Re: the time the bootloader takes to load the kernel image/ramdisk, it is around 10 seconds, so that is our lower bound.

We are then in the 10-20 secs interval.

1 Like

well, the alternative is a rather risky partition switching back and forth between boot and recovery, enforcing of a kernel patch onto customers (to automatically force reboot to recovery on a panic to trigger the rollback there) and quite a set of code changes to snapd (there are a bunch of initial PRs for that already). the result will be a lot more porting work and code to maintain for potential customers, adding a 10sec reboot to an existing 30sec boot still levels that out IMHO.

how do you want to get more precise figures if you dont know how long the pre-boot bits take ? we can surely test on a dragonboard but that wont tell you anything about the future foobar-board that might have an SPL that takes 20sec on its own to initialize the board before switching to the actual second stage bootloader.

the figures we can measure are kernel and initrd and these wont be 30 additional seconds.

we also wont be able to do any measurements at all if we dont move on btw :slight_smile:

I’m pointing out something very straightforward: there’s a limit to how much additional time any reasonable person will tolerate on a boot. We need to evaluate it before we consider this okay. If you disagree, then please let the rest of us figure this out.

i assume you are testing with a generic ubuntu initrd ? that would be total overkill for the actual implementation, we should weed out everything and turn /init into a simple 20-30 line script (simply the snappy_boot script that we use in uboot with some extra bits added), i can promise you to get you an initrd that runs the whole thing in less than 5sec :wink: