One of the interesting features talked about is deduplication support for content within Snap packages. Snaps would be automatically deduplicated of common files shared between snaps based upon their file hashes. There would be de-duplication on the file-system layer, de-duplication on snap downloads (with server support), and perhaps de-duplication of mapped libraries from the linker. Deduplication is a big work item and likely will take a while to implement fully, but it’s an interesting goal nevertheless.

Phoronix, 8 May 2015

Did this happen/is this happening? :slight_smile:

Deduplication at a file level across snaps is not happening. If you put many copies of one file into one snap then yes, those are de-duplicated. If you have a need of sharing larger amounts of things among a set of snaps then please use the content interface. Lastly we have base snaps that lets one change the root filesystem (to contain any set of extra libraries) but this feature is a little bit immature across the stack (snapcraft and snapd).

1 Like

Is there any plan for cross package deduplication in future? Not at block level as available inside Snaps but as a versioning functionality through checksums? It is a feature that Flatpak provide through Ostree for the entire local repository. It can be of a sizeable benefit as for example even RPM and DEB suffer as well with apps like Electron based ones, probably it would be even more beneficial in a packager like Snap and Flatpak where there exist bigger chances of duplicate stuff across packages.

1 Like

Snaps are one file (a squashfs image), whereas flatpaks are expanded into nested directories of lots of files (the ostree system). I am not an engineer, but hypothetically (to the extent I understand anything at all) if you were to dedupe at a file-level, it would be after each snap install, update, refresh, revert, whatever, and would require walking all of the installed software (hashing and comparing all the while), creating some set of content-sharing snaps, and rebuilding all of the installed snaps. On the standard snapd update schedule, this whole process would happen several times per day (as the more software you have installed, and the closer to “new” you’re following, the more likely you’ll have some updates). It’s a lot of action to save some hundreds of megabytes, in a world where cheap micro-sd cards are in the dozens of gigabytes. It’s probably not worth the complexity, potential for bugs, effort, etc., if it’s at all possible.

The disk-space concerns are largely mitigated by the squashfs image approach, as the packages remain transparently compressed, so most applications installed are smaller than their conventional counterparts by about half even with more stuff bundled (sometimes more, sometimes less).

Well it is sad… In a system based on RPM I’ve found more than 3 GB of duplicated files with fdupes completely excluding personal files, how much would be in same package, or how much would it be in a packager where dependencies can be ignored is still a question.

Anyway, I don’t see why it couldn’t:

  1. Maintain a hash database, no whole system rehash.
  2. Maybe try to at last automatically dedupe against known shareable snaps available in repo, maybe at build time for devs? Will devs be aware of shareable content on repository apart from “search yourself”?

I’m not trying for undoable solutions as you can see, but, I heard the same thing on cell phone market and the experience actualy sucked…

Your second model is already a part of the system, so-called “content sharing” snaps, which are a per-project thing (check out Ken vanDine’s work on the Gnome desktop apps and platform for how it’s been used so far). Realistically, how important is 3GB of disk on the system side, even if snaps were the same size as all the conventionally packaged debs? And honestly, snaps are very space efficient already compared to debs; I tried to do a direct comparison of VLC snap to deb, but it’s multiple packages as a deb, however on something simple to compare like Skype or Spotify, it’s less than half the size.

It is space usable for user content. I have enough thing and not enough money so I have to pick what will not fit my SSDs, some things like frameworks and SDK I simply gave up and moved them for HDDs. Ironically today I still manage disk space and forget completely my RAM. Anyway thanks it pretty much answered my question.