Status Tracking for Build VM

The intention of this forum post is to track the conversations, work items, opens and brainstormed ideas through the implementation of build VMs.

Document index

Opens

  • cross compilation (build environments)
  • shared parts and multiple bases remote parts being replaced in part by templates.
  • using snapcraft from specific branchessnapcraft injected from the host.
  • figure out the amount of work it will take to move to dispatching commands instead of having snapcraft installed on the VM Not practical.

18 May 2018 Minutes

Summary of discussion:

  • Advancements of a snapcraft implementation using qemu using Fedora.
    • We should use existing mechanisms already in place to setup snapcraft and snapd from inside the VM.
  • Dealing with available RAM:
    • kernel parameters to modify to allow for over-committing.
    • load a started VM versus cold booting.
    • build environment should define the amount of RAM it requires.
  • Brain storming on how the build environment should be designed to cope with plugins:
    • Idea one, the plugin asks for data and operates with them
    • Idea two, the plugin asks the build environment to set itself up appropriately for running a plugin’s lifecycle step.
  • Brainstorm of how to deal with stage-packages:
    • a hook the environment can execute on.
    • using a factory in snapcraft that can be overridden through build environment.
  • Overview of plugins Plugin overview on specific assumed build environment
  • Brainstorm for shared parts across multiple bases.
  • Quick brainstorm of cross compilation with build environments.
  • Tune the invite list depending on the agenda.

29 May 2018 Agenda

29 May 2018 Minutes

  • Overviewed Fedora Build VM progress.
  • Need to solve what stage-packages and build-packages mean, some ways:
    • Package sets as level of indirection to bases. ROS has something similar base.yaml but too granular. These would be maintained in potentially 3 different places:
      • the build VM.
      • in snapcraft.yaml explicitly, defining the content of the set
      • inside a template, as part of snapcraft code.
    • Make parts specific to bases by mapping of the base in snapcraft.yaml.
    • Make parts use on <base> for stage-packages and build-packages.

Opens

  • How will package sets stand the test of time?
  • Do we want snap developers to take the burden of supporting other bases when they do not intend to when sharing their parts from their snapcraft.yaml? (This could translate to only supporting of have tested on a single architecture)

Meeting notes for May 29th

  • Sergio built Fedora 27 using qemu with a spike implementation
  • Used 9p, worked fine, but it was a hello world
  • Hacked together build packages installation (gcc, etc)
  • If SSH keys are available, use them; otherwise, create keys
  • No progress on comparing boot time with and without VM snapshots and finding bottlenecks
  • Important open point: how to take stage-packages and build-packages for remote parts
  • We may need something like package sets: a general name that refers to a package list which may be different per base
  • Snapcraft has a map [some key] => [list of packages]; today that key is the remote part itself; package sets might change it to be [base+set name] => [list of packages]

Vague build procedure in spike

  • Go into directory
  • Image boots
  • Key injected via cloud-init ISO
  • Wait for SSH to be available
  • SSH into machine
  • Setup 9p directory
  • Prepared the VM with snapcraft inside (cheating)
  • Create a snap

Meeting notes for Jun 1st

  • [ACTION] Sergio to figure out what the cost is of not having snapcraft at all inside the VM, but driving from the outside, Spread-style
  • [OPEN] Consider what happens when a plugin is introduced - what is the default behavior for a build VM that wasn’t aware of it?
  • What if someone wants to build on a cheap VM provider?
    • Ship the image to the build provider.
  • rosdep is in the archive in Ubuntu, but in pip for others; how to satisfy both?
  • Instead of package sets, we probably want something like a two-layered approach: a sort of per-base “plugin skeleton” that specifies custom packages, and also scripts for particular execution points
  • These possibly need access to some of the plugin configuration
  • Probably needs customization per architecture as well, but shouldn’t have to define a completely new stanza for every architecture

I wouldn’t worry too much about this. I’ve already got prototype packaging for snapcraft, I just never published it because I don’t have a way to use it with a Fedora base yet. If there’s a patch set, I can build a package and push it to Fedora COPR so it can be used. As we get closer to a working model, @kyrofa and I can move towards pushing it into the Fedora repositories.

Meeting notes for June 6th

  • Remote parts will be going away once bases are introduced, which means we don’t need to design a solution that works for them.
    • Moving to templates instead
  • No customization inside VM, plugins are responsible
  • Still need to be able to template individual build VMs
    • Address individual environments inside the code itself, so the template could be customized per base
  • Driving snapcraft via SSH instead of snapcraft within VM:
    • We started with rewriting the subprocess calls to be remote, but we’re realizing this is a larger and larger task. It gets over-complicated when you consider file operations and downloading stuff
    • A version of snapcraft needs to be in the VM
  • If host is running a snap, inject it
  • If not running a snap, two choice:
    • No idea, don’t care, end result is we install the snap in the VM
    • You explicitly provide a snap

Meeting notes for June 8th

  • Migration from remote parts to templates:
    • People not using bases can continue using remote parts, but we’ll start
      showing deprecation notices with migration guides.
    • People using bases must start using templates
  • Provisioning snapcraft within the VM
    • Inject the snap from the host
    • Inject snap try
  • Will be moving to multipass for VM instead of qemu
  • We’re not a huge fan of the workflow for cleanbuilds: install lxd, configure
    it yourself, and then we can use it. We should look into bundling multipass in
    the snapcraft snap (both client and backend). Still need to figure out what to
    do for the deb, source, and others.
  • [ACTION] Sergio needs to figure out if snapshotting is worth it, and feed back
    to Multipass team.
  • Regarding remote filesystem (9p etc.), we’ll take whatever Multipass is using

Yeah this wasn’t about running on other operating systems, there were just numerous improvements to be had if we could turn snapcraft into an orchestrator instead of a re-exec dispatcher. Alas, after some investigation we’ve found that it’s too large a chunk of work with its own complications.

Meeting notes for June 12th

25 sec cold boot (150 runs)
snapshotting needs to run now
ACTION will have numbers by EOD (for 1GB ram, then will run with more)

ACTION something building end to end by Friday for 16.04 from a branch so others can try.

  • Should we move to multipass now, or continue spiking with qemu?

    • Continue with qemu until we know exactly what we want.
    • Test the intention, not the implementation
  • what should the build user be? should we build as a user or root?

    • isn’t it a world of pain to not build as root?
    • doesn’t matter as long as we have an option to become root
  • mounting as 9p, can you have files not owned by root?

    • ACTION security model option
  • where will we store the vms?

    • agreement from the sprint is to have a hardcoded url for the image
    • with a hash or without a hash?
      • the problem with a hash is that you have to update snapcraft every time you update an image, so leave it off for now
    • fine to have the json in github and then downloading the image based on what you find on github
    • the only attack vector we get that we wouldn’t have otherwise is no delay from someone hacking github; otherwise you’d have to wait for a release

ACTION can you give us an example of the json file, how you plan to support the json file for the next meeting?

  • the case of having a part with no plugin defintion would not be allowed when you use a base
    • As well as having a part in after that’s not defined
  • and we need to sort out the case of desktop files
  • we probably need to organise the conversation around templates so we’re not forcing - people writing desktop apps to copy everything into their snapcraft.yaml

ACTION next tuesday we should start discussing templates

kyle thinks they should be more like plugs
gustavo thinks layered approach is the right way to go
some system to reject parts of the template as well
use the template but this specific bit is wrong for my case

Meeting notes for June 15th

  • Sergio showed a demo of his qemu build VM work
    • Injects core and snapcraft snaps from the outside
    • Currently injects all the time (will later check if version already there as an optimisation)
    • Have to tell it where the private and public key are (key management not yet implemented)
    • Per-project VMs coming when the aforementioned project loading refactoring lands
    • Logs what it’s doing to a file outside the VM
    • Using the mountpoint as a passthrough of files between VM and host, but only for experimentation now
    • Having a snapcraft.yaml with base: core triggers this new logic
    • Only called snapcraft build to start, calling snapcraft snap will pick up where it left off
  • Gustavo: Where do you create the build VMs?
    • XDG cache directory
    • Eventually: per-project cache, and snapcraft clean should remove the VM
  • Gustavo: We should defer refreshes
  • If you exit with this code, snapcraft will continue. If you exit with this code snapcraft will stop
    • Michał: SSH and exit codes do not play well
      • Sergio disagrees; paramiko seems to be working with this
    • Gustavo: If it fails, definitely the machine should stay alive
    • Gustavo: And we should also be able to force it to stay alive, e.g. “hey, I’m only playing right now”
    • Michal: snapcraftctl continue would feel better to me than exit codes. An explicit command to continue or abort
      • Gustavo agrees
    • Gustavo: If it fails, there’s a good likihood that you want to debug. If you’re running snapcraft from a terminal, reuse. We could say this is the default behaviour; if you build something, we’ll keep the machine around for a while. On failures we always reuse. If the machine is around for so many hours or you haven’t run snapcraft inside that environment for so long, kill it. It can be a day, so you have a chance to go to sleep and not have to manually fire it up again.
    • Gustavo: We should use the whole environment inside that qcow image, so that every time you build it until you decide to clean it, you’re always building in that same VM.
      • Well, but how is that different? How is that different from building in the VM?
      • Use a qcow image that is chainging the underlying qcow images. It’s copy on write. It’s nice because then we don’t need to keep the machine around for a long time. We only keep it on failures. Run the same image with exactly the change from the last time (with a flag to reset and try from the ground up). Inside that VM you can get the benefits of what snapcraft gives you today in terms of caching.
      • Sergio: Before stopping the VM, currently I’m taking a snapshot
        • So another idea would be the next time you run, load that state from that snapshot and pick up where you left off.
  • Sergio ran the numbers: 25-30 seconds cold, 10 seconds from snapshot (consistent regardless of RAM used)
    • Saving state takes time. Sergio forgot to measure though.
  • Only plan to use cloud-init for ssh keys
  • The first time we boot a VM ever, do all the updates load the keys and take a snapshot from there. So any other qcow image using that base will reap the benefits.
  • Not going to worry about image index until we have something more solid working
  • Should be close to becoming a feature flag on edge (mostly in master)
  • Just kill the VM for now; forces us to go through the low speed path, which is nice as we develop. It’ll help us fine tune it. And then at some point we can optimise, keep it for five minutes, or two mintues. Or we make it so fast that waiting the 2-3 seconds for the VM to come up is irrelevant.

Meeting notes for June 19th

  • A template manipulates the yaml, but cannot introduce logic that is not expressible as a plain snapcraft.yaml. That invariant is a nice property for templates, so they remain clean, clear, and composable
  • There should be a command in snapcraft to print the exploded view for any given snapcraft.yaml, for understanding
  • Templates should inject the bare minimum plugs for the implied application (gtk3, etc) to work at all; in other words, not have a plug because “many need it”
  • Plugs defined per app or globally are appended to the implied by the template
  • Templates should not inject obscure scripts, which are likely to explode other templates and user logic
  • We need some sort of “command-launchers” list (but not with that name) that chain independent runners that can be composed for getting the actual “command” to run
  • We should be careful not to see templates as a huge hammer which are an easy way out of having to design things in snapd, or the snap format itself

Meeting notes for June 26th

  • Sergio wants to show progress on build VM stuff
  • Just finished building snapcraft using snapcraft with qemu.
    • It’s a big project and took several paths to get there
    • Not as fast as Sergio wants it to be yet
    • Using a backing store. Check out ~/.cache/snapcraft/projects/<project name>, and you’ll see qcow2 images.
    • Using builder.qcow2, but with a backing file
    • Backing file is about 300MB, then qcow on top is 1.6GB (to build snapcraft anyway, this will differ depending on project)
    • All the SSH key management is in place and automated
    • Parts and stage are all inside the VM, prime is kept on the outside. This helps speed.
      • Just with 9p and not putting parts and stage in VM, snapcraft build took 1.5hrs, when it should take closer to 10mns.
    • Eventually we’ll get prime into VM as well, which means only 9p mount will be sources, which could potentially be read-only
      • Debugging in-VM would be painful without the ability to tweak sources. Should leave read/write.
    • Con of using backing file: can’t update image without destroying everything and starting from scratch. Can’t just switch out backing files.
  • If we run snapcraft clean, what do we expect to happen? Clean build-packages and everything? Toast whatever is on top of the backing file?
    • Yes. snapcraft clean's meaning will change as snapcraft gets better at detecting changes, so it makes sense to clean everything instead of just the parts
  • snapcraft shell or snapcraft inspect command to take you into the VM?
    • Not snapcraft shell, too generic. Try:
      • snapcraft <step> --shell puts you inside context INSTEAD of
      • snapcraft <step> --debug puts you inside shell when a given task fails
      • snapcraft <step> --shell-after puts you inside a shell after step has run, regardless of whether it succeeds or fails
  • Anyone can experiment with edge/bases to get this today. Use with SNAPCRAFT_BUILD_ENVIRONMENT=qemu <snapcraft command>
  • Focus on getting core16 working well before looking at core18.

Meeting notes for June 29th

  • Sergio working on the snap injection story
  • Made changes to mount instead of transfer snaps over, seems to work fine
  • Current status:
    • We currently have an environment variable that can be set to qemu, and we can use a base, which will download an image, use it as a backing file, boot the machine, inject user keys, mount the project there using 9p, inject the host’s snapcraft and core by using a mount, and then run snapcraft.
    • Instead of creating parts/stage etc. in the project, we create it in the
      VM itself.
    • Inside that project (9p), that’s where we place the final snap.
    • Hoping to get done by Tuesday: --shell, --shell-after, and --debug
    • Also decided that if we run snapcraft clean we’ll destroy the entire qcow
      image
  • All the snap injection work is on the common provider class, which means it’ll
    also work for e.g. multipass
    • Instead of exposing all the host’s snaps, it would be nice to give the user
      the ability to choose.
    • snapd could use these files to inform its cache
      • When pruning cache, only remove items if its refcount is 1.
    • So the VM only needs to use snap download --revision to get the same snap
      as on the host
    • Snapd should support snap download --revision for a snap revision already
      available locally (it doesn’t today). This may be tricky for private (e.g.
      purchased) snaps
    • Okay nevermind: use 9p for now. We should fix the cache, but the download
      change is a little tricky to do right now. snapd needs to schedule that work.
  • We run sudo snap watch --last=auto-refresh before running snapcraft, but there’s a race condition there.
    • Michael suggests postponing refreshes and THEN watching, thereby eliminating
      the race.

Meeting notes for July 3rd

  • Status quo

    • Build providers:
      • –shell and --debug are working
      • clean wipes the qcow image
      • prime dir inside qcow image
      • –debug we use to debug snapcraft and not to debug a project. We should come up with a more internal debug flag for debugging snapcraft itself
  • Open issues:

    • Having issues with paramiko (SSH lib) and actual TTYs. For some reason it’s eating stdin
    • Don’t have clean for specific parts or steps
  • Demo

    • Awkward to not have the place where the failure happened. Dumped into the project, even though it’s the pull step of a specific part that failed
    • Should snapcraft know /snapcraft?
      • If pull fails and you’re dropped into parts/<part name>/src, should you be able to run snapcraft again from there, or do you need to move? i.e. should snapcraft magically discover /snapcraft?
        • No, just search up instead, so you can run snapcraft in a subdirectory of a given project
  • Need to respect local system. Bail on anything that would modify the local machine. No packages installed, no chdir

    • This makes it somewhat difficult to develop/test snapcraft itself; we need to figure out a good solution for that.

Meeting notes for July 6th

  • Sergio fixed most of the output with snapcraft already
    • Nothing is being printed, but now a spinner would be useful
    • –debug is now user-level, and doesn’t result in printing snapcraft debugging lines
    • Should we move to using pkexec to launch VM, which will print out the command being run? Gustavo suggests simply changing the prompt line to “Launching VM with sudo”
    • Bringing the prime directory back to the user so they can run snap try?
      • Gustavo doesn’t want to do this by default, but leave it as an option. Can also snap try in the VM
  • snapd’s cache used to be a small history of what was used. Things were removed after a while. Now, before we clean the cache, we look at the reference count of the file (they are hard links). If there is more than 1 reference to the cache, we don’t clean it, since there is no space saving.
  • The project itself it mounted into /home/project/snapcraft. Let’s rename the user to “snapcraft”, and use “project” as the mount point. PS1 could include the name of the snap followed by an indication of cwd.

Meeting notes for July 10th

qemu

  • @sergiusens is looking into detecting why qemu would fail to launch (i.e.; ram, corrupted disk, etc). The exit code is always the same so some grepping of output might be needed.

Templates

  • @kyrofa has prepared for a demo for templates using darktable as an example.
    • snapcraft explode with a gtk3 template expands the snapcraft.yaml and adds a gtk3 part.
    • It also adds after, it adds (appends) it to all the parts that are described.
    • It adds a command-chain entry with desktop-launch. This of course will not work yet as the machinery behind snapd to support this has not started.
    • It adds the correct plug entries.
  • We need to refine the UX of the thing end to end.
    • We need a way to discover templates, snapcraft templates.
    • A snapcraft expand-template (this would be explode).
    • We need a way to show details for one template, snapcraft template <template-name>.
  • Looked at the format of a template, three top level entries:
    • app-template
    • part-template
    • parts
  • For scalars snapcraft.yaml takes precedence over the template.lists and dicts get merged, snapcraft.yaml keys take precedence.
  • Review thoughts, instead of something declarative we should have something procedural, the template should just be a python function called out to with parameters of where it would apply to. We could in the future look into adding an extension language (e.g.; lua). Maybe its best to avoid a domain specific language.
  • The declarative form is not a public API.
  • You can use template under apps or at the project root.
  • app-template and part-template feels like they are at the same level but they are not.
  • instead of a part-name and app-name have an asterisk (wildcard):
apps:
    *:
         <template>

parts:
    *:
         <template>
  • error if someone adds an app.
  • we don’t want wrappers at the same time (this requires the command-chain).
  • Defined syntax to support multiple bases (use @ with list of bases), @: functions as an else
source@core16: ...
source@: ...
  • @kyrofa will take this back home and think about it and propose in the next meeting.
  • If something is overwritten an error should be raised.
  • Manually namespace with template.
parts:
    gtk3-template-lib
  • Same should happen with command-chain.