Squashfs performance effect on snap startup time

This is the first in a series of posts investigating snap application startup performance, particularly focusing on desktop applications (though many of these inquiries may be applicable to command line only or daemons as well).

Background

What we’re looking at today is the overhead of Squashfs on snap startup. Snaps are themselves essentially an archive of almost all the files needed for an application to run, and they are mounted when they are installed (rather than extracted). The type of archive is called Squashfs and it is implemented as a filesystem inside the Linux kernel. Squashfs supports five types of compression options when built: xz, lzo, gzip, zstd and none (which means no compression).These options affects both the size of the archive and file access speed at runtime when first accessed (this cost is only paid once as the kernel is able to cache the result of decompressing the archive internally).

Squashfs is used inside snapd because it does not need to be extracted to be used, it simply needs to be mounted. In addition to this, Squashfs archives are always mounted read-only. This is a very useful property for software installations and ensures that the software is protected from tampering, is easily cryptographically signed, and probably operates the same way on different machines because the files are the same. Snapd uses compression to ensure that the files take up a minimal amount of space on the filesystem and also incur minimal network transfers during when downloading from the Snap Store.

Mounting a Squashfs filesystem is very fast on most systems because the decompression only happens when files are initially accessed. The mount itself is handled by systemd when the snap is first installed or during the early phases of the bootup for a system that already has snaps installed.

When a snap is first run on a system, even though the Squashfs archive is already mounted, it still will need to be decompressed. The overhead on launching a snap has contained in it some amount of overhead due to the initial, cold cache access of files within the Squashfs archive. Intuitively, this overhead is proportional to the size and number of files that need to be accessed to launch the application, as well as the exact compression algorithm used.

Testing Procedure

We have run multiple tests to measure various compression options with Squashfs for snaps. Although the none compression option does not use any compression, for a baseline we also will test snaps installed via “try” which is a developer-oriented option designed to simply perform a bind-mount of a given source directory into the user facing /snap install directory. This contrasts with a full install, which will copy the .snap file (which is a Squashfs archive) into a snapd controlled directory (i.e. /var/lib/snapd/snaps) and mount the Squashfs archive onto the user facing /snap install directory.

The snaps chosen here are meant to be representative of desktop applications, including both small and minimal graphical applications such as gnome-calculator, as well as large sprawling games such as supertuxkart and large, “self-contained” programs such as Chromium which is one very large executable file and a set of other large dynamic libraries.

All of this data is available publicly on GitHub here. I produced the graphs using the Wolfram Cloud programming environment, where you can view the code used to generate all of these graphs and interact with these graphs a bit if you like, for example mouse-hovering over the box and whisker charts will show a tooltip with the exact mean, etc. (note that it might take a while to load that page). See that here. An equivalent Python or R program could probably produce similar graphs in Jupyter.

The first charts here are a series of “Box and Whisker” charts (also called just a box plot) showing the statistical distribution of first and second launch times for all of the snaps in the test set across 2 test runs, combined into a single data set for charting purposes. Each individual test run ran 10 iterations across the 6 different compression a sequence of installing the snap, then launching the main graphical application in the snap twice consecutively (without removing the snap and thus unmounting the Squashfs file for the second launch), and finally hashing every file in the snap as well as measuring the size of each compressed Squashfs archive. This provides us with 4 metrics which we can measure Squashfs snap performance against:

  1. Size of archive
  2. Speed of cold-cache launch
  3. Speed of warm-cache launch
  4. Speed of cold-cache file walk

I chose to use Box and Whisker charts because these are good at displaying multiple distributions simultaneously for comparisons, better than histograms and better than just displaying the mean, median, etc. individually in standard bar charts. You can see that for some of the runs there are large outliers including some of the test runs with SuperTuxKart where somehow the launch time was measured as a literally imperceptible 12 milliseconds, but these do not distract from the more probable results which are within the colored boxes (and technically speaking between the upper and lower quartile). The y-axis here is in milliseconds and is the time until the window manager has registered that the application is being displayed.

Both tests were performed on the same desktop system running Ubuntu 19.04 with an NVME PCIe SSD, 32 GB of RAM and an AMD Threadripper 1950X processor. I also have run the tests on a less powerful Dell XPS 15 laptop running Ubuntu 18.04 and the results are very similar. I am in the process of setting up a Raspberry Pi to test the results there, and will update this post if the Pi shows any different trends but I don’t expect it to.

Here are the Box and Whisker charts for the launch times of the various snaps, with “gzip 1” meaning the first, cold-cache launch of the snap when compressed with gzip, and “gzip 2” being a subsequent launch after the first one.

In these charts there are a couple interesting things to note. First is that, as expected, the second launch time is significantly faster than the first launch time for almost all of the snaps and compression options, including the try mode installed snaps. This is expected even for the try mode snaps (where there is no decompression) because there is some caching that happens orthogonal to the Squashfs archive decompression caching for things like fonts, images, etc depending on the exact graphical application. In addition to this, many graphical applications need to setup caches for various other graphical software before anything will run properly, and additionally may even use shared dynamic libraries that are not present in the snap, being present instead in shared runtime content snaps such as gnome-3-28-1804 (this is the case for gnome-calculator for example). Investigations into snap startup overhead from these additional types of caching will follow in another post.

Second is that for a small application such as gnome-calculator, the difference between the compression algorithms is virtually non-existent, however for larger snaps such as SuperTuxKart and the extreme case of Chromium there are significantly different first launch speeds. Small applications having virtually no difference in first start time is expected because there are few files and/or small files to decompress.

What’s especially interesting about the Chromium and SuperTuxKart differences is that the Chromium Squashfs archive is actually smaller than the SuperTuxKart archive, yet takes almost double the time to launch. The difference here can most likely be attributed to the different file requirements for initial launch. Chromium as an xz compressed snap is approximately 167 MB (627M uncompressed), with the majority of that size going to the Chromium executable (47M compressed, 137M uncompressed) as well as it’s dynamic library dependents. Indeed, looking at a Log Linear plot of the file sizes relative to their contribution to the total file size, the top ten largest files in the Chromium snap are about 45% of the whole snap, and the top 100 are about 85%, while for the SuperTuxKart snap the top ten files are 20% and top 100 are about 50%. While this is not direct evidence of why Chromium takes longer to launch than SuperTuxKart, it is probable that the kernel has to do more decompression for Chromium before the processes can start and thus this may be why Chromium takes longer to launch.

Additionally, the compression ratio for Chromium is also higher at approximately 3.75 versus SuperTuxKart which is at approximately 1.34 (calculated by dividing the try mode snap size by the xz compression Squashfs snap size).

This hypothesis also makes sense because SuperTuxKart is a game and has thousands of game assets which don’t need to be immediately read in order to launch the game and just display a window, while the Chromium executable and dynamic library dependencies likely all need to be at least read if not fully loaded into memory before Chromium can launch a displayable window. Further investigations in a follow-up post will also explore this topic more.

It is also useful to look at the time it talks to walk the entire Squashfs filesystem and read every file in the snap. This time shows how long it takes to essentially cache the entire Squashfs archive, and for the larger snaps such as Chromium and SuperTuxKart, is much longer than the launch time, somewhat confirming my theory about launch time depending on which specific files need to be loaded before a window can be shown. Here are the walk times for the snaps:

As expected, for small snaps like gnome-calculator it takes almost no time to walk the snap itself, but note that gnome-calculator specifically also uses a content snap for many of it’s libraries, so this does not tell the full story for gnome-calculator. To show that we would also need to measure the time it takes to walk the runtime content snap.

For large snaps, such as SuperTuxKart, since there is over 500 MB of files, it takes significantly longer to walk the entire snap than it does to just launch and display an application. Additionally, again the xz compression format is the slowest here by of more than double the next slowest format (in this case gzip).

All of this suggests that in terms of snap startup time, we should probably not be using the xz format for snaps. Instead we should be using zstd, gzip or lzo. To better understand which format would be a better alternative, I also have computed the compression ratios for the various compression formats.

For snaps with many medium sized files such as SuperTuxKart, there is little difference between the various compression formats. However for snaps such as mari0 and Chromium, the next best choice for size would be zstd followed by gzip. Looking back at the timings above, gzip was slower than zstd, so zstd could be a solid choice for balancing filesizes against snap speedups.

If changing the Squashfs compression format of snaps is an undue burden right now this post does suggest that there are some alternatives that could be explored.

First is that, if possible, applications could be coded such that to launch they only load the bare minimum number of files to display a window, lazily loading all of the other files after the first window is displayed. This would reduce the amount of time until the snap actually displays on the window, but may not significantly reduce the amount of time before the application is fully usable. However for a large application like Chromium this is probably near impossible to coordinate.

Secondly, since it seems that the first startup time is in great part due to Squashfs decompression, we could incorporate that decompression into OS boot process somehow where before the user is able to launch applications something touches all of the files in the Squashfs image (or even just “enough” for some definition of enough) while the OS is booting so that when the user finally gets to launch the application, most of the decompression has already happened.

11 Likes

Nowadays zstd is a rather only game in town for compression. It’s also highly configurable with 1-20 range of compression level. Do you know what level was used for those tests? You may also try benchmarking this to find best level in terms of compression ratio/speed.

Sorry I don’t quite understand, are you saying that zstd is the “only game in town” for compression meaning it’s the most performant? Or did you mean to say it’s “a rather old game in town” meaning it’s out of date?

Yes, we used the default level of 15, mksquashfs does support configuring that level with the -Xcompression-level option.

FWIW zstd isn’t available everywhere that uses snaps.

Do we have some kind of documentation or explanation of what defines “everywhere that uses snaps” ?

There isn’t, afaik, that list. Given how adoption has spread all over the place I doubt we could assemble that list in a reliable way.
But I do know we’ve had to add detection code for (a) no squashfs support in-kernel, and (b) no xz support in that in-kernel support as people tried to use it in places without that support, and I also know that squashfs-tools does not have zstd support in 14.04, 16.04, nor 18.04, and that the mainline kernel got zstd support in 4.14.

mounting a zstd squashfs on 16.04 (with the default 4.4 kernel) currently results in a failure with

squashfs: SQUASHFS error: Filesystem uses "unknown" compression. This is not supported

in dmesg. It does work in 18.04 (with its 4.15) though. But unsquashfs -ls still fails there.

I expect I could use a -hwe kernel on 16.04 to get zstd support, but haven’t checked.

Can we peek at e.g. CentOS’s kernel config?

Is this code just “fail in obvious way if we don’t have xz decompression support” ? Or do we actually try to do something? Seems to me like we can’t do anything about that other than in userland re-compressing the snap file when it’s initially installed and that in and of itself seems fraught with peril.

the code is to give the user an actionable (or at least informative) error, because mount does not

I meant it’s better than any other alternative if you take both compression ratio and speed into account at the same time. LZ4 can be faster but compression ratio is poor. XZ can have better compression ratio (or maybe not at 20 zstd level) but speed is poor. This is also what your tests show. Basically everyone is moving to zstd (if they can).

2 Likes

@ijohnson are there options to xz that make it less terrible for the chromium case?

Yes there are options to xz to set the dictionary size. I haven’t explored the various different options for the different compression algorithms but if it does seem insurmountable to change the format to something else, then it is probably worth exploring that for xz compression with squashfs.

I mean, we can move to zstd. We should probably have a plan to do so. But it’ll take years… meanwhile, what do we do? :slight_smile:

2 Likes

sure, insurmountable was probably a poor choice of words, perhaps Herculean is a better adjective…

Herculean and Sisyphean are just the right amount apart.

1 Like

Will the store accept lower rate compressed xz squashfs files? If so, to mitigate in the short term, could the publisher of the chromium snap unsquash/resquash as part of the build (assuming it can’t be coerced as part of the build directly) to set a lower compression rate? (with the obvious changes that will come with it, such as longer downloads and higher disk space)

@popey you’re assuming that a lower compression rate on xz means a faster decompression, which i haven’t seen data for

I’m not assuming anything, just asking if fiddled settings will be accepted. So testing could be done to gather the data about whether it would indeed be faster/slower/larger/etc.

Not currently. The resquash tests use the current defaults. It is possible to inspect the squashfs superblock to see what compression is used, but the superblock does not say anything about the compression level, so any changes to level will cause the resquash to fail.

Also, and I know I am a broken record on this, if we change anything in this area, the new thing must be deterministic and always compress the same when given the same options.

1 Like

I’m sure it’s a lot of work, but what if the store could serve up multiple variants of the snap, serving the most optimized one given a client’s configuration?

Is there a precedent for evolving the snap file format? It would seem to me to be inevitable given enough time anyways… :slight_smile: