Is the only limitation here the reviewer-tools checks to ensure the squashfs options are correct/expected?
Since there is no way to detect compression level, there wouldn’t be any other checks for that. As for changing the compression or using other options, I don’t think the store is doing any additional checks for looking at the squashfs superblock beyond running the review-tools, but perhaps @store should confirm.
Also, while not mentioned in this topic, I’ll mention that we require
-no-fragments because otherwise the mksquashfs operation is not deterministic.
The store couldn’t do the optimizing itself, because it isn’t in the business of changing the bits that the publisher provides (an invariant in the design). However, @ijohnson is exploring the idea of what it might take to provide an alternative that a publisher could choose (and/or snapcraft could default to) that the store could serve. Eg, if gzip is ubiquitous, a significant performance improvement, the store could be updated for squashfs gzip deltas, the review-tools are updated to introspect the snap to decide which resquash options to use, the increased download size is an acceptable trade-off, etc, etc, then we could allow the store to serve both formats.
Oh, perhaps you were saying the publisher uploads all of xz, zstd, gzip, etc and then snapd on the system sees what it supports and asks the store for the best supported format. Possible. Not sure we want to go there…
Personally, I would go for the route where the publisher uploads a format that moves less frequently (e.g. can be the current xz for a long time) but the store can re-compress to one of several different formats, with assertions that form a chain of trust that the re-compressed thing is exactly the same after decompression as the original.
This provides the most flexibility to deliver improved format over time as well as to deliver the right format for a given device. There will be an ever growing gulf between devices with extremely slow CPUs on decent network and extremely slow network with decent CPUs.
By having this flexibility a store can offer CPU optimised image for one device while offering a bandwidth optimised image for another device. Devices have the option of locally re-compressing and verifying the security (deterministic and reproducible recompression) via the assertion system.
I realize this is nowhere close to being deliverable but I wanted to state my view on this subject.
That would go against the store invariant that the exact bits that the publisher uploads are the exact bits that the device receives. If we go this route, we would need buy-in from all architects and various stakeholders.
I understand and agree that such agreement would be required.
This may sound a bit crazy, but is there something we could do locally based on what the system supports, while the snap is installed?
There’s precedent for this in “other” app stores, such as Google Play, where the developer uploads a signed image that the store then slices and dices to various formats suitable for differing devices. The store then distributes the appropriate package to the requesting device.
Following the Google Play simile, Android devices also perform an optimisation process on the downloaded app package to improve the performance specifically for that device. E.g. they might pre-jit any Java-like bytecode (in Android’s format, not official Java format, because copyright) to machine code tailored to the CPU in the device. This would be analogous to our repackaging the snap image once downloaded by snapd into a format that is faster to decompress.
I could see a system option that when set will repack a snap after downloading it before it gets mounted but this will make both installation and refreshes take significantly longer but perhaps that’s an okay trade-off for better runtime performance.
Not clear to me how we could automatically guess the best compression format though, but perhaps we could make the system option choose what format to use, so i.e. new Ubuntu images that support zstd could have that, old ones could do gzip, etc
Yes, we definitely could do that given the space and CPU horsepower.
We could even unpack the snap if that is the only way out.
After looking into this some more, I think this statement is wrong. For example I have created a snap filesystem with this mksquashfs:
$ mksquashfs mari0-root/ mari0-xz-ex.snap -noappend -no-fragments -all-root -no-xattrs -comp xz -Xdict-size 8192 Parallel mksquashfs: Using 32 processors Creating 4.0 filesystem on mari0-xz-ex.snap, block size 131072. [=========================================================================================================================-] 3375/3375 100% Exportable Squashfs 4.0 filesystem, xz compressed, data block size 131072 compressed data, compressed metadata, no fragments, no xattrs duplicates are removed Filesystem size 47918.08 Kbytes (46.80 Mbytes) 32.48% of uncompressed filesystem size (147546.15 Kbytes) Inode table size 28644 bytes (27.97 Kbytes) 26.47% of uncompressed inode table size (108221 bytes) Directory table size 22576 bytes (22.05 Kbytes) 39.15% of uncompressed directory table size (57663 bytes) Number of duplicate files found 116 Number of inodes 2881 Number of files 2384 Number of symbolic links 145 Number of device nodes 0 Number of fifo nodes 0 Number of socket nodes 0 Number of directories 352 Number of ids (unique uids + gids) 1 Number of uids 1 root (0) Number of gids 1 root (0)
and then I am able to see the compression options used:
$ unsquashfs -s mari0-xz-ex.snap Found a valid SQUASHFS 4:0 superblock on mari0-xz-ex.snap. Creation or last append time Wed Oct 30 11:54:07 2019 Filesystem size 47918.08 Kbytes (46.80 Mbytes) Compression xz Dictionary size 8192 No filters specified Block size 131072 Filesystem is exportable via NFS Inodes are compressed Data is compressed Fragments are not stored Xattrs are not stored Duplicates are removed Number of fragments 0 Number of inodes 2881 Number of ids 1
I presume this means that the reviewer-tools could learn how to handle this as well?
Side-note: at least for this snap, I think that the binary output is still deterministic when used with compression options.
Ah, I stand corrected: “3.1 Compression options - Compressors can optionally support compression specific options (e.g.dictionary size). If non-default compression options have been used, then these are stored here.”
Thanks! I tested this with various sizes and found that anything other than defaults or explicit ‘100%’ caused the option to be stored in a way the review-tools could detect.
I did not test resquash against a large number of snaps with non-default values (note with fragments it was only sometimes non-deterministic), but I suspect that this would not be a problem (though we would have to test).
I still doubt the @store is doing this check, perhaps they could comment?
This is correct. The store relies on review-tools for both the scan (for which we trust the review-tools’ end result) and extraction (using review-tools’
unpack-package script). Other than that, the store does not do any other checks on actual squashfs options or anything.
The only thing we do is look at the first few bytes of the snap file to ensure it looks like a squashfs.