Snap deltas are OK but not great

Our use of snaps is generally in locations with poor internet - therefore small update/download sizes matter a lot. Snaps file sizes are notoriously not great, but there’s been quite a bit of publicity about how snap deltas help minimize download sizes.

And while deltas certainly help, it seems that they are severely hamstrung by the underlying mechanics. The reason for me saying this is that:

  • Snaps are stored as SquashFS images, where the underlying snap content is compressed
  • Due to this compression, deltas don’t really work very well

The reason for the second point is that even a small change to uncompressed data can lead to a large change in compressed output - and therefore a disproportionately large delta.

In other words, since the .snap files are compressed, two revisions of a snap that may actually have very similar content would actually have very different snap files, with a large delta between them.

So I did a little experiment, to check just how much of an issue this is. I tested this empirically on a few common snaps, as well as one of ours, and the results were pretty striking:

+------------------------+---------------+------------+------------+
|    Snap (revisions)    | Content Delta | Snap Delta | Difference |
+------------------------+---------------+------------+------------+
| snapcraft (7205->7257) | 1.4MB         | 13MB       | 9x         |
| snapd (15183->15541)   | 18MB          | 37MB       | 2x         |
| firefox (1233->1262)   | 26MB          | 43MB       | 1.7x       |
| core18 (2349->2378)    | 980KB         | 6.1MB      | 6x         |
| ammp-edge (737->740)   | 160KB         | 4.8MB      | 30x        |
+------------------------+---------------+------------+------------+

The “Content Delta” in the above is the size of the delta file generated between the uncompressed “pseudo files” corresponding to the two snap SquashFS images. I.e. it’s a measure of how much changed in the underlying content.

The “Snap Delta” is the size of the actual delta between the two snap files.

The difference between the two approaches is generally quite big. Even in the best cases the snap deltas are ~2x the size of the content delta. And the most striking discrepancies seem to arise where the content delta is small. For example in the ammp-edge example, we only changed a few lines of code between the two revisions - as reflected by the small content delta - yet the snap delta still ended up a hefty ~5MB.

I wanted to share this since I found it quite interesting, and also as food for discussion towards a potentially better approach.

As part of my experiment - and as potential inspiration towards a better way - I did create a repo with two Bash scripts (essentially one-liners), which can be used to generate and apply deltas based on SquashFS content rather than compressed images. generate_delta.sh is indeed what I used for the content deltas in the comparison table above.

8 Likes

One tangential point on this, is that it seems that bsdiff actually performs better when it comes diffs of executable binaries than xdelta3. In my empirical tests this also held for snaps/SquashFS images. The difference isn’t quite as significant as the differences between compressed and unsquashfs'ed files that I talked about it my first post - and I wouldn’t necessarily advocate a switch to bsdiff - but it’s interesting to note.

Here is also a recent benchmark of algorithms (not snap/SquashFS related) that I found interesting: https://zork.net/~st/jottings/delta-compression-tests-2019.html.

Finally, a bit further down the rabbit hole, there’s also been some work to improve on bsdiff, with ddelta looking like the front-runner - (maybe) in use for Debian delta upgrades (presentation).

1 Like

A quick update on this, in the context of snapcraft’s support for LZO compression (https://snapcraft.io/blog/why-lzo-was-chosen-as-the-new-compression-method). While LZO compression does produce larger snap files - i.e. the compression ratio is not as high as it is with XZ - it seems that there may be a benefit in terms of delta sizes.

Prompted by @picchietti in another thread I experimented with deltas on SquashFS images created with different compression methods. The image was of a minimal OS file tree with a Python application, where a single character was added to a file. So it’s basically a large image with a minimal change.

I should say this was completely non-scientific. The mksquashfs options are also not exactly what snapcraft uses (I kept most defaults), and I used default compression levels for all compression types. But the results are quite interesting!

For each of the compression types supported by mksquashfs - as well as the case where I simply put the OS tree in a tar archive without compression - the image and delta sizes are as follows:

+--------------------+----------------+----------------+
| Compression method | Image size (B) | Delta size (B) |
+--------------------+----------------+----------------+
| none (tar)         |    132,444,160 |         91,700 |
| lz4                |     67,796,992 |          8,236 |
| lzo                |     51,961,856 |          8,829 |
| gzip               |     46,866,432 |        148,645 |
| lzma               |     36,757,504 |        117,643 |
| xz                 |     36,737,024 |        117,019 |
+--------------------+----------------+----------------+

As we expect, the LZO image size is far from the smallest; the archive is 41% larger than the XZ-compressed one. But the delta size is 13x smaller than the XZ delta.

The LZO delta is smaller than any of the other methods - even the uncompressed archive - by at least an order of magnitude. The only compression that yields a smaller delta (by a whisker) is LZ4, at the expense of an archive that’s 30% larger than LZO.

Again, this was sort of an artificial test case that may not be representative, but it looks like switching compression might yield some significant benefits here. At some point I’ll try to do some tests with real-life snaps using LZO.

4 Likes