Proposal: Move to more granular architecture names for snaps

Conan_Kudo · August 30, 2017, 11:21pm

So, this actually is both for snapd and snapcraft, but I can only pick one… So, snapd it is…

Last week, @niemeyer and I had a conversation about architecture handling in both snapcraft and snapd, which involved discussing examples of where relying on base arch names (i386, amd64, arm64, armhf, armel, etc.) can bite us.

This is very obviously depicted with ARM, as the armel set expands to a very large set of somewhat incompatible software-floating-point little endian 32-bit ARM architectures. The armhf set expands to a slightly smaller set of hardware-floating-point architectures that have incompatibilities as well.

It’s even more problematic when distributions redefine what those names mean. For example, Raspbian defines armhf as armv6hl, while Debian defines it as armv7hl. This can lead to dpkg-compatible but not CPU/binary compatible installations. In Fedora, we have derivative distributions that do rebuild the distribution for different so-called armhfp architectures for various reasons (IoT, SBCs, etc.). Unfortunately, the way snapd and snapcraft handle (or intend to handle in some cases) architectures is too coarse for this.

This is the reason that RPM doesn’t actually build packages to a base arch name. Instead, we build to target the exact architecture (which is why Mageia and openSUSE 32-bit x86 packages are mostly i586 packages with some i686 packages, while Fedora and CentOS ones are all i686 packages). This means we don’t have armhfp packages or arm packages, we have armv7hl and armv5tl packages (this can be observed in Mageia, which does offer these). Pignus (a Fedora derivative focusing on Raspberry Pi devices) ships armv6hl packages for broader compatibility with the Raspberry Pi ecosystem.

So, after talking with @niemeyer about it to better fomulate the idea, I propose that we move to more granular names for architectures and internally alias them as necessary.

This means that for snapcraft and snapd, you’d be telling it armv7hl rather than armhf. On the snapcraft side, package manager backends can handle translating as appropriate. Going from “real arch” to “base arch” is way easier than the other way around, depending on the distribution and target device.

For example, today, as of Debian 9, this is the following base arch map:

DPKG Base Architecture	Real Architecture
amd64	x86_64
i386	i686
arm64	aarch64
armhf	armv7hl
armel	armv4t
ppc64	ppc64 (powerpc64)*
ppc64el	ppc64le (powerpc64le)*
s390x	s390x

Note: For ppc64 variants, I’ve seen both names used throughout aspects of compilers and other things (at least on my computer), hence the other name in parenthesis.

My proposal is that snapd use the real architecture names to disambiguate and to make it easier for properly determining what’s runnable on the target machine. As a transition measure, we can alias the current names and mark them as deprecated to get people onto the path of using more granular names. As neither snapd nor snapcraft do a lot with architecture names yet, we’re in a position to fix this before it can bite us.

This also makes it much easier to support new architectures, as we don’t have to bend over backwards to contort a base arch name it try to differentiate them. And flavors that offer some enhancements, like armv7hnl, armv8hnl, etc. are trivial to support, as they have well-defined schemes.

niemeyer · August 31, 2017, 3:20pm

Thanks for writing the proposal down, Neal.

This in particular clearly validates the need to distinguish architectures:

We do need a way to allow distributions to install snaps that are compatible with the hardware they are using, and this particular ambiguity is clearly unacceptable. So yes, let’s definitely do something about it and make the situation more clear.

While doing that, we need to keep at least two things in mind:

It needs to be effective. We don’t want to do a lot of work and end up in the same place.
It needs to be sane. If we ask people to be able to know the exact name that defines all the details of the instruction set they are sitting on, it won’t work.

On the first point, I’m concerned that the proposal is changing from the list of high-level names chosen for debs to the list of high-level names chosen for rpms. Indeed the items on the left-hand side of that table are more coarse grained: amd64 is an architecture family that preserves backwards compatibility across microarchitectures, and same thing for i386. The items on the right side are not a “real architecture”, though: x86-64 is just another name for amd64 which is an architecture family including several backwards compatible microarchitectures, i686 is a microarchitecture inside the x86 family (also known as i386), and armv7hl is an ARMv7 family, with h marking the presence of a VFPv3 coprocessor extension, and l marking it as little endian.

So… processor families, microarchitectures, extensions, backwards compatibility, … we can choose to be pragmatic and solve individual problems cheaply, but then face the potential of more issues soon, or we can choose to go deeper into the problem and find a way to represent some of those ideas more correctly. Both sound feasible from where we stand now, each with their own advantages.

Conan_Kudo · August 31, 2017, 3:25pm

You are right that the names are similar to ones chosen by RPM, but I didn’t get all the names from there. Some of them were collected from sources like ARM and even pages from the Debian wiki that indicated what they mapped to.

To be clear, the “real” architecture is defined by what I saw on the filesystem tree in compiler documentation, kernel documentation, etc.

Personally, if you want to keep amd64 for 64-bit x86, I don’t particularly care. What I particularly want is to disambiguate them in a way that is easily extensible going forward so that we don’t hit this again and again.

ogra · August 31, 2017, 3:33pm

As i said in our IRC discussion i think familiarity is also an aspect we should take into account, users of debian based distros will not necessarily know that armv7hl is the same as armhf …,

We should have the visible arches mapped to whats familiar to the users by their existing setups …

Conan_Kudo · August 31, 2017, 3:41pm

And I’m perfectly happy with this. A snapd compiled for Raspbian would emit armv6hl snaps as armhf, while one for Debian would do it for armv7hl snaps. Debian-based systems could always see amd64 for 64-bit x86 snaps, while other distribution families would see x86_64. i[3456]86 could always be i386 on Debian, and so on…

What I want is for snapcraft and snapd to internally have more fine-grained handling of architectures so we can be smarter about this and not do dumb things.

ogra · August 31, 2017, 3:43pm

yup, that is why i highlighted “visible” above … i’m fine with using whatever fits as internal arches (though it smells like this will become a not actually small change to get right in all areas)

niemeyer · August 31, 2017, 3:54pm

Familiarity is not achievable here, I believe. Snaps are supposed to work on most Linux distributions, and those don’t share terms, so no matter what we pick it won’t be familiar for everybody. Being understandable and effective is a better target.

niemeyer · August 31, 2017, 4:05pm

I’m sure we can find all of those terms on the Internet. What I’m trying to do is to at least get some consensus on what those terms actually mean, and what the outcome of taking one route or another will be.

The fact these are not real architectures but rather somewhat arbitrarily chosen and non-homogeneous keys to a set of CPU characteristics, means that just transitioning from one set of such keys to another set of these keys won’t solve the problem long term. We will hit this again as soon as someone decides to compile a distribution for another microarchitecture.

So, again, the routes are: we do it cheaply and save time and effort, by just adding names solving the specific problems we know about, or we engage more deeply into the problem and organize these ideas more correctly in the code so that we can map the world more realistically.

Both are doable, and should at least be taken into account before we move forward.