Snapcraft build-on hint for builders

Good point, So any as a string or a list? Either way there’d have to be checks to avoid [any, amd64] constructs.

This is what I get for being up so late as I was yesterday.

I think a list would be simplest and most consistent, so [any]. Ideally we’d just import the .deb rules directly so that we could reuse the same logic, and those rules already forbid things like [any, amd64]. With the possible exception of source, basically all of it still makes sense, and there doesn’t seem to be a compelling need to reinvent the semantics, just to rephrase the concrete syntax.

@niemeyer this was also one of the topics we ran out of time to discuss at the sprint. Can you please take a look?

There are several different aspects related to this topic which we need to take into account:

  1. What architectures the snap that was built can run on
  2. What architectures the snap may be built for
  3. What architectures is the snap being built for
  4. What base the snap that was built will run on
  5. What bases the snap may be built for
  6. What Linux distribution may the snap be built on
  7. What base should be used when the snap is built on a particular Linux distribution

The conversation so far has handled these topics somewhat implicitly, and at times it feels like some unintentional bridging between these ideas is taking place. This is worrying since people not participating in this debate will be unable to tell what particular aspect is covered and how to do what they want.

So, I propose we start off with a minimalist approach that focuses on the common cases and on a reasonable user experience for people doing the usual. For example, most people will not be building their snaps in multiple distributions, as by the end of the day a single snap will be pushed into store, and that single snap will be on a single base.

Also, it feels strange that we’re trying to separate multiple architectures upfront depending on the Linux distribution. In practice, the parts that compose the snap will in most cases define what architectures are supported by the snap, and these are not dependent on a particular distribution.

Along similar lines, the base will often be dependent on the system that is being used to create the snap instead of on the particular project being built. For example, if one builds a snap with packages from Fedora, it doesn’t make much sense to use a base that was created out of ubuntu 16.04.

So, circling back into the original point, I’m concerned about just putting syntax in without understanding in more detail the problems we’re solving and how people are supposed to use it (or how to understand it).

Perhaps we can take some apparently “safe” first steps:

  • Define a system-wide default base internal to snapcraft. When not specified, snapcraft will attempt to build using a base that is appropriate for the local system if one exists, or complain if it can’t map the local system into an appropriate base.
  • Accept a new optional “base” field in snapcraft.yaml, which allows overriding the system default and request that a particular base be used for the built snap (one base).
  • Further explore why the “architectures” field is not enough. Having multiple architecture fields will certainly be confusing, so if this is not enough we need to write down the exact problems we have today with this field and fix those.

Does that sounds reasonable?

The architectures field in snapcraft.yaml behaves the same way it did in snap.yaml and it basically hints the store into on what architectures to offer this snap.

The original introduction of this was fat packages. We can change the semantics, but this is how architectures works today:

name: my-package
architectures: [amd64]
...
parts:
    source: .
    plugin: dump

Regardless of the architecture (armhf, amd64, i386) this is ran on this will happen

$ snapcraft
Snapped my-package_1.0_amd64.snap

With snap.yaml containing a architectures: [amd64] entry.

And for

name: my-package
architectures: [amd64, armhf]
...
parts:
    source: .
    plugin: dump

Will result in

$ snapcraft
Snapped my-package_1.0_multi.snap

With snap.yaml containing a architectures: [amd64, armhf] entry.

If left out, snapcraft will set it to the architecture of the build host.

So in essence it is the architectures to say it will work on at runtime but not where this build should be triggered on.

We don’t do breaking changes in the format, but the use case of this is so confusing that we have not promoted it but I don’t mind breaking the semantic meaning of this in snapcraft and tell people that want to dump a python file and create a snap on amd64 to push it to pi to just use --target-arch (for the few not knowing about this already).

This feature is really old as well, I wasn’t in the snapcraft conversations at that time:

commit a86d80aa22f8b728ec116ad87a652f34d1571992
Author: <redacted>
Date:   Tue Jul 28 16:00:10 2015 -0400

    Look in library-triplet locations and set architectures in package.yaml

So that is the explanation of why it is not enough, but it is so backwards that if there is consensus I don’t mind changing this behavior with the caveat of knowing this might break some existing builds.

If reconsidering how the architectures field works, please also keep in mind the following:

  • 64bit snaps that ship 32bit binaries that run in compat mode (eg, amd64 wine snap that can run both 64bit and 32 bit windows applications on a 64bit machine)
  • 32bit snaps that ship (some) 64bit binaries that run on the kernel’s native architecture (this is highly specialized but needed for systems with a 64bit kernel snap and 32bit core snap. I’ve been told by CE that devices using a 64 bit kernel with 32 bit userland is important for certain classes of embedded (IoT) devices (and something at least the snapd security policy currently handles))

IME, we don’t have to handle these specially-- in both cases publishers will produce snaps that match the architecture of the core snap, but ship (a few) binaries that might be for the other arch. Today this all works ok and I mention these use cases for if we change things we don’t break them going forward.

Luckily the behavioral change is to make it something builders can respect to have information on where to build in the builder pool of architectures and not what the resulting architecture would be which would be determined by the host used to build.

So, if the host is amd64 and I dump in 32bit binaries in there, or regardless of what parts I use, the architecture would be amd64 unless I use --target-arch which would trigger cross compiling mode.

2 Likes

@niemeyer I gave this more thought and I don’t think it would be wise to change the semantic meaning of architectures in snapcraft.yaml as it will mean that it has a different meaning than the architectures field in snap.yaml which would be more confusing when we speak of each interchangeably.

@sergiusens This is not necessarily true, and I don’t see a clear proposal above to make a judgement on it.

Trying to drive this topic to a more concrete change, here is a proposal: what if we allowed the architecture field to be defined like this:

architectures:
  - [armhf]
  - [amd64, i386] 

The behavior of snapcraft when it sees a field like that is to build all the architecture sets it can. Each set will result in a single snap, and the value in snap.yaml will be the set itself (either armhf or amd4, i386 in the case above).

We may also introduce an --arch flag that tells snapcraft to only build the snaps that work on that particular architecture. In the example above, both --arch=i386 and --arch=amd64 would cause snapcraft to only build the second snap, even it knows how to build an arm snap.

How does that sound?

1 Like

Debian basearch names are too ambiguous at times. For example: both armv6hl and armv7hl would be Debian’s armhf or RPM’s armhfp base arch name but are fundamentally incompatible. Heck, armv7hl and armv7hnl itself can be problematic. As long as we’re willing to ignore the fact that our base arch definitions include incompatible architectures, base arch names are fine. However, this will bite us. Even in Debian, this is a problem, because Raspbian armhf is not the same as Debian armhf, and that makes binaries incompatible with the simplistic model presented by dpkg.

Several months back, I proposed we move from base arch names to platform triples to eliminate ambiguity (as far as I know, both RPM and dpkg can use components of a platform triple to resolve to the understood platform), which @mvo thought wasn’t a bad idea. And at least with the DNF package manager, we need to know the “real-ish” architecture to be able trigger foreign architecture mode (doing armv7hl stuff on x86_64, for example). However, that’s a drastic revamp of how things work.

The idea of any architecture should be implied when not specified. It’s something that I’ve considered a weird quirk of Debian that you specify this (in RPM, this is implied when architectures are not specified).

The all architecture (equivalent to the RPM noarch architecture) is perfectly fine for arch-less snaps, but I think those don’t exist in practice. The nature of how snaps work mean that it’s rather difficult to have such a snap exist, and if it does, it’s really by accident rather than intentionally. I think it’d be a bad idea to support such a thing, as snaps are not granular enough.

I also agree with @niemeyer that architecture is a trait separate from the system base, and usually is an overarching one (excuse the pun!). Often, architectures do not change across platform bases, so it doesn’t make sense to force redefinition across different distribution selections.

That also said, I think using Debian logic for architectures will bite us hard, and we should avoid it. It’s too simple, and there’s enough cases that can be conceived where this will produce unexpected results.

1 Like

@Conan_Kudo We don’t actually rely on Debian logic for anything architecture-related. We’re simply reusing the architecture names as we needed to pick a set of names and didn’t feel like inventing new names unnecessarily, and assumed that some good thinking went into defining the existing names (that’s true for rpm too, but we couldn’t pick both).

If we need to introduce new names, we just need some good rationale for it, but we’re not strictly tied to the logic of anything else but snapd itself.

If we could be relatively assured that architectures would always match the assumptions the base arch names give, then I think it’d be fine. However, with snapd needing to be able to run across a wide array of distributions and CPUs, this falls apart. I think we probably need to at least consider moving to using machine architecture names used in platform triples.

For example:

  • i586 / i686 instead of i386 (this relieves ambiguity with 32-bit x86, and CPUs/distros do choose different levels)
  • armv6hl, armv7hl, etc. instead of armhf (removes invalid assumption of compatibility)
  • armv4tl, armv5tl, etc. instead of armel (removes invalid assumption of compatibility)

For these cases, enforcing the correct architecture can matter, as it can mean the computer can or cannot run the snap.

Some cases, there’s not much of an issue:

  • x86_64 / amd64 are consistent on what they mean, and I’d be comfortable with both being supported as labels for the same architecture
  • aarch64 / arm64 are in the same boat as x86_64
  • ppc64le / ppc64el / powerpc64le are in the same boat as x86_64 and aarch64 are
  • s390x is the only label for that particular architecture

In the cases above, I’d prefer if all those markers actually worked and internally resolved to their correct designation (x86_64, aarch64, powerpc64le, etc.).

It sounds good, I am even ok with one list as long as we understand that there is a syntax change in what architectures mean (how to parse it) in snap.yaml and snapcraft.yaml.

But this new syntax also allows us to think about backwards compatibility. We already have --target-arch for cross compilation.

Let me think about the workflow here and get back to this.

You mean handling “architectures: [a, b]” the same as “architectures: [[a, b]]” per the proposal, while still allowing the latter? We probably cannot do that in a compatible way. Or is that not what you mean?

We need to keep some pragmatism while considering those ideas. Distributions don’t have dozens of different alternatives for every minor change in CPU compatibility, and we should be careful not to break down excessivel5y either. For example, what’s the real benefit of transforming i386 into i586 and i686?

It’s a fair remark here, and specifically in the case of 32-bit x86, it’s not really as much of an issue. It breaks down on ARM architectures mainly, and probably will also have the same issue with MIPS and RISC-V (though neither are supported in snapd today).

At least specifically with 32-bit x86, Debian did baseline i386 until recently. OpenSUSE and Mageia do i586 (meaning certain instructions aren’t used), and Fedora does i686. There are CPU incompatibilities for this, but in practice I don’t think we’ll encounter them much except with IoT and specialized computers like the OLPC XO laptops.

Can I have an option for CPUs without mmx? Sorry for the half joke, but part of the conversation here brought me back to the days when you had to look out for that :slight_smile:

Oh, I mean

$ grep architectures snap/snapcraft.yaml
architectures: [[armhf], [amd64, arm64]]

Would result in

$ grep architectures prime/meta/snap.yaml
architectures: [amd64, arm64]

or

$ grep architectures prime/meta/snap.yaml
architectures: [armhf]

Technically there is no issue here, my angle is from a documentation and support point of view.

Assuming the typo in the second example is an excessive bracket and not a missing one, +1.

This seems an odd conglomeration of build-on and run-on semantics. The [armhf] case says to me “build and run on armhf”. But I’m not sure what [amd64,i386] says to me. Knowing what architectures means today I think it means “run on amd64 and i386,” but where does it build? amd64, or i386? It’s only one snap, so it must be one or the other, right? How do we know which is appropriate?

The architectures field is already so confusing, its only saving grace is that it’s just as confusing in both snapd and snapcraft, since it means the same thing :stuck_out_tongue: . Changing its meaning in snapcraft makes it worse, at least to me.

Furthermore, making that syntax change in a backward-compatible way seems to lead toward something even more difficult to understand, document and use.