Our use of snaps is generally in locations with poor internet - therefore small update/download sizes matter a lot. Snaps file sizes are notoriously not great, but there’s been quite a bit of publicity about how snap deltas help minimize download sizes.
And while deltas certainly help, it seems that they are severely hamstrung by the underlying mechanics. The reason for me saying this is that:
- Snaps are stored as SquashFS images, where the underlying snap content is compressed
- Due to this compression, deltas don’t really work very well
The reason for the second point is that even a small change to uncompressed data can lead to a large change in compressed output - and therefore a disproportionately large delta.
In other words, since the .snap
files are compressed, two revisions of a snap that may actually have very similar content would actually have very different snap files, with a large delta between them.
So I did a little experiment, to check just how much of an issue this is. I tested this empirically on a few common snaps, as well as one of ours, and the results were pretty striking:
+------------------------+---------------+------------+------------+
| Snap (revisions) | Content Delta | Snap Delta | Difference |
+------------------------+---------------+------------+------------+
| snapcraft (7205->7257) | 1.4MB | 13MB | 9x |
| snapd (15183->15541) | 18MB | 37MB | 2x |
| firefox (1233->1262) | 26MB | 43MB | 1.7x |
| core18 (2349->2378) | 980KB | 6.1MB | 6x |
| ammp-edge (737->740) | 160KB | 4.8MB | 30x |
+------------------------+---------------+------------+------------+
The “Content Delta” in the above is the size of the delta file generated between the uncompressed “pseudo files” corresponding to the two snap SquashFS images. I.e. it’s a measure of how much changed in the underlying content.
The “Snap Delta” is the size of the actual delta between the two snap files.
The difference between the two approaches is generally quite big. Even in the best cases the snap deltas are ~2x the size of the content delta. And the most striking discrepancies seem to arise where the content delta is small. For example in the ammp-edge
example, we only changed a few lines of code between the two revisions - as reflected by the small content delta - yet the snap delta still ended up a hefty ~5MB.
I wanted to share this since I found it quite interesting, and also as food for discussion towards a potentially better approach.
As part of my experiment - and as potential inspiration towards a better way - I did create a repo with two Bash scripts (essentially one-liners), which can be used to generate and apply deltas based on SquashFS content rather than compressed images. generate_delta.sh is indeed what I used for the content deltas in the comparison table above.