This is a followup post to Squashfs performance effect on snap startup time where we are looking at the effects of dynamic library caching on snap startup performance.
Dynamic libraries and the linker
First, some background information on what dynamic libraries are and how they fit into application startup. An application as defined by a snap specifies a command to run, which includes both the executable file which is run and the arguments to that executable file. That executable file however, may not include all the code/objects/functions necessary to run and may need other library files to be loaded for it to run successfully. These other files are generally called dynamic libraries and need to be located by a program called the dynamic linker. On Linux, the dynamic linker is called ld.so and it has 5 total mechanisms to locate dynamic libraries needed by a given location, that are searched in the given order.
The first mechanism is called rpath
, and it’s a specific setting set at build time for an executable. This is a single directory. Note that while this is typically set at build time, it can be modified later, but the executable must be writable, so this setting could not be modified at run-time for a snap. Additionally, setting rpath
is considered deprecated and instead should be replaced with runpath
, which, if set is used in place of rpath
and used below.
The second mechanism is through an environment variable defined at run-time called $LD_LIBRARY_PATH
. The expected value for this is a list of directories or a single directory.
The third mechanism is through the runpath
setting for an executable. Unlike the rpath
setting, this can be a list of directories.
The fourth mechanism is through a cache file located in /etc/ld.so.cache
, which is essentially a lookup of the filename of the desired dynamic library and its known location.
The final mechanism is searching the default directories of /lib
, /usr/lib
and on 64-bit platforms /lib64
and /usr/lib64
.
Status quo for snaps
Since snaps typically ship almost all of their own dynamic library dependencies inside the snap, and these dependencies show up at run-time in /snap/$SNAP_NAME/current
(aka $SNAP
), the dynamic linker will not find the libraries in $SNAP
by default without using one of the first 4 mechanisms to instruct the linker to search $SNAP
. The easiest method for snap packagers to get the dynamic linker to find these libraries is usually to set $LD_LIBRARY_PATH
in the snap environment to a list of all the relevant directories in $SNAP
that contain dynamic libraries. While effective at resolving the location of dynamic libraries, this introduces a performance overhead and introduces latency in launching applications because the dynamic linker must iterate over the entire list in $LD_LIBRARY_PATH
looking for a given dynamic library, which for many directories in $LD_LIBRARY_PATH
will be incorrect if $LD_LIBRARY_PATH
is wrong. For example, with the chromium snap, we have all of the following directories in $LD_LIBRARY_PATH
:
/var/lib/snapd/lib/gl
/var/lib/snapd/lib/gl32
/var/lib/snapd/void
-
""
(the empty string - this is likely a bug in desktop-launch that an empty string shows up here) /snap/chromium/949/lib
/snap/chromium/949/usr/lib
/snap/chromium/949/lib/x86_64-linux-gnu
/snap/chromium/949/usr/lib/x86_64-linux-gnu
/snap/chromium/949/usr/lib/x86_64-linux-gnu/dri
/var/lib/snapd/lib/gl
/snap/chromium/949/usr/lib/x86_64-linux-gnu/pulseaudio
In addition, the chrome binary has rpath
set to the special linker recognized value $ORIGIN
, which is just the directory where the chrome binary exists. This in effect adds 1 more directory to the list that the linker needs to search, because rpath
is always consulted first before $LD_LIBRARY_PATH
.
If we assume that, on average a given dynamic library is found halfway through this list, the dynamic linker needs to look through around 6 candidate paths per dynamic library. The chrome binary has a total of 95 dynamic libraries it depends on (excluding the special “dynamic” library linux-vdso.so.1
), which ends up meaning that the linker would have to consult 570 candidate dynamic library paths during run-time, which is a significant overhead. We could try to optimize this by ordering the entries in $LD_LIBRARY_PATH
to be such that directories with the most dynamic libraries in them at the front and the least dynamic libraries at the end so that we can improve the likelihood that the linker finds the right dynamic library, but we still introduce iterating over this list for some dynamic libraries.
A better way - use the cache
A better method is to setup the linker’s cache to point to the right directory for all of the dynamic libraries that the snap needs and remove all entries in $LD_LIBRARY_PATH
and rpath
so that the only method the linker uses to find a given dynamic library is to consult the cache, which produces the correct library. This is how most non-snap applications are setup, where after an application is installed from a Debian package, the linker cache is updated with all of the new dynamic libraries that were installed into the standard /lib, and /usr/lib
directories.
We can do this for snaps with a couple of caveats and workarounds. The first thing to note is that the linker cache is located at /etc/ld.so.cache
, which for strict snaps will be the same as the host, unlike most other files in /usr
, /lib
, etc. This is problematic for the snap, since it should not modify the host’s cache because the host’s cache will then contain snap specific dynamic libraries which at best will be incomplete and miss other libraries on the host file-system, and at worst will break many applications due to things like libc incompatibilities where the host is say an Ubuntu 19.10 based rootfs and the snap’s rootfs is an Ubuntu 16.04 based rootfs and so the files referenced from the snap’s cache may be for Ubuntu 16.04 whereas non-snap applications installed on the host need to have their Ubuntu 19.10 snap dependencies. Thus we need a snap-private dynamic linker cache, which is specific to each snap and not shared, but the file that is located in the snap’s run-time rootfs is shared with the host.
The second problem we run into is which directories to use when building our snap-specific dynamic linker cache. By default, the program ldconfig
, which builds the cache, will only look at a few standard library paths such as /usr
, /usr/lib
, etc. for dynamic libraries to include in the cache, which will not include our snap’s dynamic libraries located in $SNAP
. As such, we need to provide these snap specific dynamic libraries to ldconfig
as arguments to build the cache, but then the question becomes how do we know which directories we should include in the cache? We could naively provide every directory in $SNAP
to ldconfig
, however this would be inefficient in the best case and could be wrong in the worst case because there could be conflicting dynamic libraries in the snap and the cache only contains a single path for a single library (i.e. the cache mapping is a bijective function), so these may overwrite each other and be wrong. It’s unlikely a snap would ship multiple versions of the same dynamic library, but it could happen for example when you have snaps using content snaps like the gnome run-time environment where a particular patched version of some dependency is in the snap, and another version of that dependency is in the run-time content snap.
The third problem is that if we remove all entries in $LD_LIBRARY_PATH
we introduce the possibility where if some of the directories in $LD_LIBRARY_PATH
are not read-only paths within $SNAP
and the libraries in $LD_LIBRARY_PATH
could disappear or change not during install or refresh of this snap (when we would recalculate the cache), applications could break because they aren’t able to find their dependencies because the cache is wrong and the linker doesn’t know where else to look. For example this could happen if a content interface is used to share libraries from another snap to this snap and the libraries move around within the other snap, but are still within the directories listed in $LD_LIBRARY_PATH
. The linker would still find those with $LD_LIBRARY_PATH
, it would just find them in a different path, but if we remove $LD_LIBRARY_PATH
and rely on the cache, it would not find them anymore. Another example of this could be if host graphics drivers were inside /var/lib/snapd/libs/gl
, but were later removed and the snap shipped software libraries inside the snap, but the cache preferred the version of the libraries from /var/lib/snapd/libs/gl
because those are hardware backed.
For the first problem, we are lucky and we can use a snap feature called layouts to mount a private file at /etc/ld.so.cache
that is only available to the snap.
A trick we can use to fix the second problem is to take advantage of the fact that many existing snap applications ship with wrapper scripts which setup a correct $LD_LIBRARY_PATH
, and so we can use that $LD_LIBRARY_PATH
value to setup a correct cache, then undefine that environment variable so that at run-time the dynamic linker just uses the cache instead. This isn’t ideal, because it still means we spend time during install or at run-time to setup $LD_LIBRARY_PATH
only to undefine it later, but it works for now and we can likely pay the cost of calculating $LD_LIBRARY_PATH
once per install or refresh and not every time at run-time like we do today.
For the last problem, we have a few options. The first is to just try really hard to ensure that this never happens, through careful management of dependencies in other snaps and make clear that removing graphics libraries may require re-installation of the snap application. Another option is that we can have a much lighter weight wrapper script run before the actual application which checks that all of the dynamic libraries are resolvable with the cache, and if they are not, re-generate the cache using $LD_LIBRARY_PATH
so that things work again. This introduces a constant overhead checking things, but ensures that things will always work. The overhead of checking that the cache is correct is probably still smaller than the current cost of searching everything in $LD_LIBRARY_PATH
. There is an example script attached at the end which may be adapted towards this purpose.
Performance effects
I have implemented a minimal version of what is necessary to build a snap-private linker cache using $LD_LIBRARY_PATH
as an install hook that runs when the snap is initially installed. This hook should also be run during refresh, but that would be as simple as running the same script during the post-refresh hook as well. I did not implement or measure the effects of the check detailed to solve the third problem above, but I have an example script which could be adapted to do so attached at the end of this post. Note that this script builds the cache using the directories from $LD_LIBRARY_PATH
as arguments to the program ldconfig
, which is what actually builds the cache. ldconfig
will include all libraries it finds in those directories in the cache, so even dynamic libraries that the snap does not currently use (but may use later or manually by the user with something else) are still populated in the cache, so long as those libraries exist in one of the directories in LD_LIBRARY_PATH.
Here we can observe the effect of using the cache on the chromium snap, a speedup of about 500 milliseconds.
To measure this, I installed the snap, with the install hook building the cache in $SNAP_DATA
, then moving the cache into /etc/ld.so.cache
which is using a layout. After installing the snap, the snap’s mount namespace is discarded and we clear out all data in $SNAP_USER_DATA
which re-triggers things like the desktop-launch helper to cache icons, etc. as well as avoids any time that chromium may take importing profiles. I confirmed that chromium does in fact use only the libraries from the cache by independently setting LD_DEBUG=libs
right before running the chrome binary so that the dynamic linker prints out what paths it searches and where it finds them, etc. and all libraries that the normal version of the snap are found directly in the cache on the first try. This was done 10 times for statistical significance.
Here is the performance effect of doing this on other snaps:
Note that I did not use box and whisker charts for this graph, mainly because with a new method of waiting for the window to display with xdotool
, the time is much less variable (here the mean was chosen for each one, out of the 10 runs for each snap type). Previously, we were using wmctrl
to find the name of the window from a shell script, but switching to use xdotool
with the --sync
option is much more reliable in waiting for the window to appear.
Proposals
Given the performance effect of doing this, I would recommend that we design some way to make it easier for snaps to opt-in to having an install/post-refresh hook build their ld.so.cache
.
There are a few ways to do this, I’ve broken them up by which part is involved, and ordered by which could be done first:
Snap authors
Snap authors could adopt the patterns set out here, including:
- Use the install/post-refresh hooks provided here to build a correct
ld.so.cache
- Add the cache correctness snippet to the desktop-launch helper (or to some other helper)
- Measure the effects of doing this more widely than just the snaps I have tested here
Snapcraft
Snapcraft could grow support for generating “install hook wrappers” which build the cache during install time and are automatically inserted into the snap with some opt-in setting in the snapcraft.yaml.
Snapcraft could also potentially add code to it’s pre-existing wrapper scripts to do the cache correctness check and regenerate the cache as needed.
Snapd
Snapd could be changed to always setup a snap-private /etc/ld.so.cache
file which would alleviate the need for snaps to use layouts to set this up.
Snapd could also potentially be changed to build the cache automatically at install/refresh time using all the directories in the snap, but from previously, there are issues with this since snapd won’t know for certain which directories should be “ignored” and not. This is also complicated by having content snaps, where we would probably also need to rebuild the cache when interfaces are connected.
Both of the snapd tasks could be done after the snapcraft things as they would not break any work done from the previous tasks.
There is another possible choice here, where snapd could gain a specification of some sort where snaps could specify what directories should be used (likely in the snap.yaml, perhaps alternatively through config dirs in meta/...
) in building the cache and then snapd uses that to build the cache when installing the snap. (thanks to @cjp256 for the suggestion)
Appendix
Here is a link to the install hook used to generate the cache. Note that in addition to adding this install hook, one should add the opengl
plug to the install hook in the snapcraft.yaml so that the install hook can read the libraries from /var/lib/snapd/lib/gl
.
Here is an example wrapper script which could be used to check if all of the dynamic libraries are satisfied by the cache, and if so unsets $LD_LIBRARY_PATH, otherwise re-generates the cache using the above script. It expects an environment variable FINAL_BINARY to be defined as the executable file to check with ldd to see if any of the dependencies are unresolvable with the cache.