Building with build-snaps bottle-necking on fuse


#1

When building in an LXD container on an up-to-date Ubuntu 18.04 with not-hwe kernel I recently noticed that -jN builds bottle neck 100% on fuse, basically using up an entire core while everything else is perpetually waiting on input.

This makes build snaps pretty much unusable for large projects because it makes builds such as KDE’s document viewer okular take more than 30 minutes when it should be less than 4!

I’ve seen this on a server with SSD and 3.5 GHZ Xenon CPU and it looks to be completely reproducible:


#2

From IRC, for the record:

  • their clouds can’t all offer hardware virtualization (so multipass is out)
  • using chroots would be a problem because some servers are persistent and some aren’t, and they’d have to keep track of which is which, and pass in snapd from the server and do a bunch of manual setup

the bottleneck on fuse is potentially because squashfuse (or snapfuse, which is just squashfuse) uses the non-async version of the fuse api, in which case rewriting it to use the async one might let it use more cores. But rewriting it is hard.

@sitter is exploring whether running snapd with SNAPPY_SQUASHFS_UNPACK_FOR_TESTS=1 is a workable workaround of the problem for them. update: wah-wah-waaaah.


#3

@pedronis I don’t know how urgent this is, but it might be relatively cheap to look into having SNAPPY_SQUASHFS_UNPACK_FOR_TESTS=1 work (AIUI it’s just a case of tweaking fuse detection to look at that var first, but there might be more bits of similar complexity in sanity).

Edit: the timeline for @sitter needing this in production is ~3 months.


#4

In our planning we discussed to look into what can be done about this. We might look both at shortish term workarounds and longer term proper solutions, talking also with the LXD team.