Need to solve what stage-packages and build-packages mean, some ways:
Package sets as level of indirection to bases. ROS has something similar base.yaml but too granular. These would be maintained in potentially 3 different places:
the build VM.
in snapcraft.yaml explicitly, defining the content of the set
inside a template, as part of snapcraft code.
Make parts specific to bases by mapping of the base in snapcraft.yaml.
Make parts use on <base> for stage-packages and build-packages.
Opens
How will package sets stand the test of time?
Do we want snap developers to take the burden of supporting other bases when they do not intend to when sharing their parts from their snapcraft.yaml? (This could translate to only supporting of have tested on a single architecture)
Sergio built Fedora 27 using qemu with a spike implementation
Used 9p, worked fine, but it was a hello world
Hacked together build packages installation (gcc, etc)
If SSH keys are available, use them; otherwise, create keys
No progress on comparing boot time with and without VM snapshots and finding bottlenecks
Important open point: how to take stage-packages and build-packages for remote parts
We may need something like package sets: a general name that refers to a package list which may be different per base
Snapcraft has a map [some key] => [list of packages]; today that key is the remote part itself; package sets might change it to be [base+set name] => [list of packages]
[ACTION] Sergio to figure out what the cost is of not having snapcraft at all inside the VM, but driving from the outside, Spread-style
[OPEN] Consider what happens when a plugin is introduced - what is the default behavior for a build VM that wasn’t aware of it?
What if someone wants to build on a cheap VM provider?
Ship the image to the build provider.
rosdep is in the archive in Ubuntu, but in pip for others; how to satisfy both?
Instead of package sets, we probably want something like a two-layered approach: a sort of per-base “plugin skeleton” that specifies custom packages, and also scripts for particular execution points
These possibly need access to some of the plugin configuration
Probably needs customization per architecture as well, but shouldn’t have to define a completely new stanza for every architecture
I wouldn’t worry too much about this. I’ve already got prototype packaging for snapcraft, I just never published it because I don’t have a way to use it with a Fedora base yet. If there’s a patch set, I can build a package and push it to Fedora COPR so it can be used. As we get closer to a working model, @kyrofa and I can move towards pushing it into the Fedora repositories.
Remote parts will be going away once bases are introduced, which means we don’t need to design a solution that works for them.
Moving to templates instead
No customization inside VM, plugins are responsible
Still need to be able to template individual build VMs
Address individual environments inside the code itself, so the template could be customized per base
Driving snapcraft via SSH instead of snapcraft within VM:
We started with rewriting the subprocess calls to be remote, but we’re realizing this is a larger and larger task. It gets over-complicated when you consider file operations and downloading stuff
A version of snapcraft needs to be in the VM
If host is running a snap, inject it
If not running a snap, two choice:
No idea, don’t care, end result is we install the snap in the VM
People not using bases can continue using remote parts, but we’ll start
showing deprecation notices with migration guides.
People using bases must start using templates
Provisioning snapcraft within the VM
Inject the snap from the host
Inject snap try
Will be moving to multipass for VM instead of qemu
We’re not a huge fan of the workflow for cleanbuilds: install lxd, configure
it yourself, and then we can use it. We should look into bundling multipass in
the snapcraft snap (both client and backend). Still need to figure out what to
do for the deb, source, and others.
[ACTION] Sergio needs to figure out if snapshotting is worth it, and feed back
to Multipass team.
Regarding remote filesystem (9p etc.), we’ll take whatever Multipass is using
Yeah this wasn’t about running on other operating systems, there were just numerous improvements to be had if we could turn snapcraft into an orchestrator instead of a re-exec dispatcher. Alas, after some investigation we’ve found that it’s too large a chunk of work with its own complications.
25 sec cold boot (150 runs)
snapshotting needs to run now
ACTION will have numbers by EOD (for 1GB ram, then will run with more)
ACTION something building end to end by Friday for 16.04 from a branch so others can try.
Should we move to multipass now, or continue spiking with qemu?
Continue with qemu until we know exactly what we want.
Test the intention, not the implementation
what should the build user be? should we build as a user or root?
isn’t it a world of pain to not build as root?
doesn’t matter as long as we have an option to become root
mounting as 9p, can you have files not owned by root?
ACTION security model option
where will we store the vms?
agreement from the sprint is to have a hardcoded url for the image
with a hash or without a hash?
the problem with a hash is that you have to update snapcraft every time you update an image, so leave it off for now
fine to have the json in github and then downloading the image based on what you find on github
the only attack vector we get that we wouldn’t have otherwise is no delay from someone hacking github; otherwise you’d have to wait for a release
ACTION can you give us an example of the json file, how you plan to support the json file for the next meeting?
the case of having a part with no plugin defintion would not be allowed when you use a base
As well as having a part in after that’s not defined
and we need to sort out the case of desktop files
we probably need to organise the conversation around templates so we’re not forcing - people writing desktop apps to copy everything into their snapcraft.yaml
ACTION next tuesday we should start discussing templates
kyle thinks they should be more like plugs
gustavo thinks layered approach is the right way to go
some system to reject parts of the template as well
use the template but this specific bit is wrong for my case
Currently injects all the time (will later check if version already there as an optimisation)
Have to tell it where the private and public key are (key management not yet implemented)
Per-project VMs coming when the aforementioned project loading refactoring lands
Logs what it’s doing to a file outside the VM
Using the mountpoint as a passthrough of files between VM and host, but only for experimentation now
Having a snapcraft.yaml with base: core triggers this new logic
Only called snapcraft build to start, calling snapcraft snap will pick up where it left off
Gustavo: Where do you create the build VMs?
XDG cache directory
Eventually: per-project cache, and snapcraft clean should remove the VM
Gustavo: We should defer refreshes
If you exit with this code, snapcraft will continue. If you exit with this code snapcraft will stop
Michał: SSH and exit codes do not play well
Sergio disagrees; paramiko seems to be working with this
Gustavo: If it fails, definitely the machine should stay alive
Gustavo: And we should also be able to force it to stay alive, e.g. “hey, I’m only playing right now”
Michal: snapcraftctl continue would feel better to me than exit codes. An explicit command to continue or abort
Gustavo agrees
Gustavo: If it fails, there’s a good likihood that you want to debug. If you’re running snapcraft from a terminal, reuse. We could say this is the default behaviour; if you build something, we’ll keep the machine around for a while. On failures we always reuse. If the machine is around for so many hours or you haven’t run snapcraft inside that environment for so long, kill it. It can be a day, so you have a chance to go to sleep and not have to manually fire it up again.
Gustavo: We should use the whole environment inside that qcow image, so that every time you build it until you decide to clean it, you’re always building in that same VM.
Well, but how is that different? How is that different from building in the VM?
Use a qcow image that is chainging the underlying qcow images. It’s copy on write. It’s nice because then we don’t need to keep the machine around for a long time. We only keep it on failures. Run the same image with exactly the change from the last time (with a flag to reset and try from the ground up). Inside that VM you can get the benefits of what snapcraft gives you today in terms of caching.
Sergio: Before stopping the VM, currently I’m taking a snapshot
So another idea would be the next time you run, load that state from that snapshot and pick up where you left off.
Sergio ran the numbers: 25-30 seconds cold, 10 seconds from snapshot (consistent regardless of RAM used)
Saving state takes time. Sergio forgot to measure though.
Only plan to use cloud-init for ssh keys
The first time we boot a VM ever, do all the updates load the keys and take a snapshot from there. So any other qcow image using that base will reap the benefits.
Not going to worry about image index until we have something more solid working
Should be close to becoming a feature flag on edge (mostly in master)
Just kill the VM for now; forces us to go through the low speed path, which is nice as we develop. It’ll help us fine tune it. And then at some point we can optimise, keep it for five minutes, or two mintues. Or we make it so fast that waiting the 2-3 seconds for the VM to come up is irrelevant.
A template manipulates the yaml, but cannot introduce logic that is not expressible as a plain snapcraft.yaml. That invariant is a nice property for templates, so they remain clean, clear, and composable
There should be a command in snapcraft to print the exploded view for any given snapcraft.yaml, for understanding
Templates should inject the bare minimum plugs for the implied application (gtk3, etc) to work at all; in other words, not have a plug because “many need it”
Plugs defined per app or globally are appended to the implied by the template
Templates should not inject obscure scripts, which are likely to explode other templates and user logic
We need some sort of “command-launchers” list (but not with that name) that chain independent runners that can be composed for getting the actual “command” to run
We should be careful not to see templates as a huge hammer which are an easy way out of having to design things in snapd, or the snap format itself
Just finished building snapcraft using snapcraft with qemu.
It’s a big project and took several paths to get there
Not as fast as Sergio wants it to be yet
Using a backing store. Check out ~/.cache/snapcraft/projects/<project name>, and you’ll see qcow2 images.
Using builder.qcow2, but with a backing file
Backing file is about 300MB, then qcow on top is 1.6GB (to build snapcraft anyway, this will differ depending on project)
All the SSH key management is in place and automated
Parts and stage are all inside the VM, prime is kept on the outside. This helps speed.
Just with 9p and not putting parts and stage in VM, snapcraft build took 1.5hrs, when it should take closer to 10mns.
Eventually we’ll get prime into VM as well, which means only 9p mount will be sources, which could potentially be read-only
Debugging in-VM would be painful without the ability to tweak sources. Should leave read/write.
Con of using backing file: can’t update image without destroying everything and starting from scratch. Can’t just switch out backing files.
If we run snapcraft clean, what do we expect to happen? Clean build-packages and everything? Toast whatever is on top of the backing file?
Yes. snapcraft clean's meaning will change as snapcraft gets better at detecting changes, so it makes sense to clean everything instead of just the parts
snapcraft shell or snapcraft inspect command to take you into the VM?
Not snapcraft shell, too generic. Try:
snapcraft <step> --shell puts you inside context INSTEAD of
snapcraft <step> --debug puts you inside shell when a given task fails
snapcraft <step> --shell-after puts you inside a shell after step has run, regardless of whether it succeeds or fails
Anyone can experiment with edge/bases to get this today. Use with SNAPCRAFT_BUILD_ENVIRONMENT=qemu <snapcraft command>
Focus on getting core16 working well before looking at core18.
Made changes to mount instead of transfer snaps over, seems to work fine
Current status:
We currently have an environment variable that can be set to qemu, and we can use a base, which will download an image, use it as a backing file, boot the machine, inject user keys, mount the project there using 9p, inject the host’s snapcraft and core by using a mount, and then run snapcraft.
Instead of creating parts/stage etc. in the project, we create it in the
VM itself.
Inside that project (9p), that’s where we place the final snap.
Hoping to get done by Tuesday: --shell, --shell-after, and --debug
Also decided that if we run snapcraft clean we’ll destroy the entire qcow
image
All the snap injection work is on the common provider class, which means it’ll
also work for e.g. multipass
Instead of exposing all the host’s snaps, it would be nice to give the user
the ability to choose.
snapd could use these files to inform its cache
When pruning cache, only remove items if its refcount is 1.
So the VM only needs to use snap download --revision to get the same snap
as on the host
Snapd should support snap download --revision for a snap revision already
available locally (it doesn’t today). This may be tricky for private (e.g.
purchased) snaps
Okay nevermind: use 9p for now. We should fix the cache, but the download
change is a little tricky to do right now. snapd needs to schedule that work.
We run sudo snap watch --last=auto-refresh before running snapcraft, but there’s a race condition there.
Michael suggests postponing refreshes and THEN watching, thereby eliminating
the race.
–debug we use to debug snapcraft and not to debug a project. We should come up with a more internal debug flag for debugging snapcraft itself
Open issues:
Having issues with paramiko (SSH lib) and actual TTYs. For some reason it’s eating stdin
Don’t have clean for specific parts or steps
Demo
Awkward to not have the place where the failure happened. Dumped into the project, even though it’s the pull step of a specific part that failed
Should snapcraft know /snapcraft?
If pull fails and you’re dropped into parts/<part name>/src, should you be able to run snapcraft again from there, or do you need to move? i.e. should snapcraft magically discover /snapcraft?
No, just search up instead, so you can run snapcraft in a subdirectory of a given project
Need to respect local system. Bail on anything that would modify the local machine. No packages installed, no chdir
This makes it somewhat difficult to develop/test snapcraft itself; we need to figure out a good solution for that.
Sergio fixed most of the output with snapcraft already
Nothing is being printed, but now a spinner would be useful
–debug is now user-level, and doesn’t result in printing snapcraft debugging lines
Should we move to using pkexec to launch VM, which will print out the command being run? Gustavo suggests simply changing the prompt line to “Launching VM with sudo”
Bringing the prime directory back to the user so they can run snap try?
Gustavo doesn’t want to do this by default, but leave it as an option. Can also snap try in the VM
snapd’s cache used to be a small history of what was used. Things were removed after a while. Now, before we clean the cache, we look at the reference count of the file (they are hard links). If there is more than 1 reference to the cache, we don’t clean it, since there is no space saving.
The project itself it mounted into /home/project/snapcraft. Let’s rename the user to “snapcraft”, and use “project” as the mount point. PS1 could include the name of the snap followed by an indication of cwd.
@sergiusens is looking into detecting why qemu would fail to launch (i.e.; ram, corrupted disk, etc). The exit code is always the same so some grepping of output might be needed.
Templates
@kyrofa has prepared for a demo for templates using darktable as an example.
snapcraft explode with a gtk3 template expands the snapcraft.yaml and adds a gtk3 part.
It also adds after, it adds (appends) it to all the parts that are described.
It adds a command-chain entry with desktop-launch. This of course will not work yet as the machinery behind snapd to support this has not started.
It adds the correct plug entries.
We need to refine the UX of the thing end to end.
We need a way to discover templates, snapcraft templates.
A snapcraft expand-template (this would be explode).
We need a way to show details for one template, snapcraft template <template-name>.
Looked at the format of a template, three top level entries:
app-template
part-template
parts
For scalars snapcraft.yaml takes precedence over the template.lists and dicts get merged, snapcraft.yaml keys take precedence.
Review thoughts, instead of something declarative we should have something procedural, the template should just be a python function called out to with parameters of where it would apply to. We could in the future look into adding an extension language (e.g.; lua). Maybe its best to avoid a domain specific language.
The declarative form is not a public API.
You can use template under apps or at the project root.
app-template and part-template feels like they are at the same level but they are not.
instead of a part-name and app-name have an asterisk (wildcard):
apps:
*:
<template>
parts:
*:
<template>
error if someone adds an app.
we don’t want wrappers at the same time (this requires the command-chain).
Defined syntax to support multiple bases (use @ with list of bases), @: functions as an else
source@core16: ...
source@: ...
@kyrofa will take this back home and think about it and propose in the next meeting.
If something is overwritten an error should be raised.