Call chains loose LD_LIBRARY_PATH and thereby child programs fail

I hit this when the libvirt service (part of microstack) tries to spawn a dnsmasq process for a network and again when that calls libvirt_leaseshelper.

$ cat > default.xml <<EOF
  <forward mode='nat'>
      <port start='1024' end='65535'/>
  <bridge name='virbr0' stp='on' delay='0'/>
  <ip address='' netmask=''>
      <range start='' end=''/>
$ sudo uvtool-checkbox.virsh --connect qemu:///system net-define default.xml
ubuntu@focal-snaptest:~$ sudo microstack.virsh --connect qemu:///system net-start default
error: Failed to start network default
error: internal error: Child process (VIR_BRIDGE_NAME=virbr0 /snap/microstack/current/usr/sbin/dnsmasq --conf-file=/var/snap/microstack/common/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/snap/microstack/current/usr/libexec/libvirt_leaseshelper) unexpected exit status 127: /snap/microstack/current/usr/sbin/dnsmasq: error while loading shared libraries: cannot open shared object file: No such file or directory

There is a workaround in the yaml supposed to get the libraries for spawned subprocesses right:

      # Libraries under /snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib/x86_64-linux-gnu are not added to the
      # runpath by default. This is OK for parent processes which get LD_LIBRARY_PATH set properly but not
      # for the child processes they spawn since the environment variables are not passed down to children by default after execve(2).
      # `readelf -d /snap/microstack/current/usr/libexec/virt-aa-helper` should return something like:
      # (RUNPATH)            Library runpath: [/snap/microstack/current/usr/lib:/snap/microstack/current/usr/lib/x86_64-linux-gnu:...]
      - LDFLAGS: '$LDFLAGS -Wl,-rpath=/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib -Wl,-rpath=/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib/x86_64-linux-gnu -Wl,-rpath=/snap/$SNAPCRAFT_PROJECT_NAME/current/lib -Wl,-rpath=/lib/x86_64-linux-gnu -Wl,-rpath=/lib/'    

And it indeed carries an extended runpath into the binaries that are built with it.

  $ readelf -d /snap/microstack/current/usr/libexec/virt-login-shell-helper  | grep runpath
   0x000000000000001d (RUNPATH)            Library runpath: [/snap/microstack/current/usr/lib:/snap/microstack/current/usr/lib/x86_64-linux-gnu:/snap/microstack/current/lib:/lib/x86_64-linux-gnu:/lib/]

But on one hand it isn’t working even with this in place and on the other hand there are other binaries like e.g. dnsmasq, but that is from a .deb. I don’t want to rebuild
all binaries that my service might call and infuse them with rpaths.
(And as I mentioned it doesn’t always work with runpath anyway)

The call chain is like

  -> dnsmasq
    -> /snap/microstack/current/usr/libexec/libvirt_leaseshelper

To be clear, the lib is there in the snap

$ ubuntu@focal-snaptest:~$ sudo find /snap/microstack/ -name '**'

I found it to be related/similar to call things through sudo.
I was comparing real /usr/bin/sudo which was causing the same issue

ubuntu@focal-snaptest:/home/ubuntu$ /usr/bin/sudo /snap/microstack/current/usr/libexec/libvirt_leaseshelper
/snap/microstack/current/usr/libexec/libvirt_leaseshelper: error while loading shared libraries: cannot open shared object file: No such file or directory

And I found that microstack has a wrapper for that as well that stops sudo from being used
to avoid that

With that mapped to bin it works for sudo:

ubuntu@focal-snaptest:/home/ubuntu$ /snap/microstack/233/bin/sudo /snap/microstack/current/usr/libexec/libvirt_leaseshelper
/snap/microstack/current/usr/libexec/libvirt_leaseshelper: try --help for more detail

So calling it through real or fake sudo is triggering/or-not the same issue.
How could I avoid that for the call path of these binaries that I have
libvirt -spawns-> dnsmasq -calls-> leasehelper

The lack of the LD_LIBRARY_PATH is exactly what makes those two differ even for the binary that has the runpath set.

ubuntu@focal-snaptest:/home/ubuntu$ sudo env | grep LD_LIBRARY_PATH
ubuntu@focal-snaptest:/home/ubuntu$ /usr/bin/sudo env | grep LD_LIBRARY_PATH

And indeed if I pass the path manually along sudo then it works.

ubuntu@focal-snaptest:/home/ubuntu$ /usr/bin/sudo LD_LIBRARY_PATH=/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/microstack/233/lib:/snap/microstack/233/usr/lib:/snap/microstack/233/lib/x86_64-linux-gnu:/snap/microstack/233/usr/lib/x86_64-linux-gnu /snap/microstack/current/usr/libexec/libvirt_leaseshelper
/snap/microstack/current/usr/libexec/libvirt_leaseshelper: try --help for more details

After understanding the above and for the time being I’ve overcome it with a wrapper like

  LD_LIBRARY_PATH="%%LDPATH%%" exec $0.orig ${@}

At the overlay part at override-build I set the placeholder to :

  sed -i -e 's?%%LDPATH%%?/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib:/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib/x86_64-linux-gnu:/snap/$SNAPCRAFT_PROJECT_NAME/current/lib:/lib/x86_64-linux-gnu:/lib:\$LD_LIBRARY_PATH?' bin/ldpathwrapper

And organize each affected binary

  usr/sbin/dnsmasq: usr/sbin/dnsmasq.orig

And link each of those in override-build

  ln --force --relative --symbolic bin/ldpathwrapper "${bin}";

It isn’t perfect yet and my mind-set of this might be still incomplete.
But slightly frustrated I’m almost considering to apply this to ALL binaries in my snap as it is
painful to encounter them one by one.

After the above debugging I have searched the forum and found

To me they all seem related, but none answers the puzzle I’m at which is:

"how to ensure globally (not case by case) that call chains will not miss libs
due to having LD_LIBRARY_PATH stripped"?

I’m expecting that I’m just not seeing the whole picture. There must be a better way than
using openvswitch (which is what microstack does). Something global that just
makes spawned subprocesses consider the extra paths correctly?

P.S. Worst case there is nothing to fix this better, but then at least this the discussion
will serve others hitting the same as a document that can be found with search-foo.

1 Like

Is this in the case where microstack is a classic snap or when it is strictly confined?

Is this in the case where microstack is a classic snap or when it is strictly confined?

The snap itself is classified as “strict”, but the above case happens when you install it with --devmode as recommended by the documentation

FYI here a copy from a discussion that I had with @dmitriis about this from the microstack POV :

[12:38] <dmitriis> @paelzer yes, I remember this one. I recall that environment variables are not automatically inherited across execve unless you pass them explicitly
[12:38] <dmitriis> which is an issue for LD_LIBRARY_PATH
[12:39] <dmitriis> so the attempted workaround was to include this into generated ELFs
[12:40] <dmitriis> @paelzer this was also causing issues for applying apparmor profiles to instances created by MicroStack because libvirt uses a helper that gets spawned as a child process
[12:41] <cpaelzer> yes sounds like the same issue
[12:41] <dmitriis> @paelzer AFAIR, I haven't figured out how to make those helper binaries to also use RPATH
[12:41] <dmitriis> and so apparmor for instances is still disabled in MicroStack
[12:41] <cpaelzer> I've started to place wrappers in the snap that re-apply the LD_LIBRARY_PATH, but it doesn't seem to ultimately fix it
[12:42] <cpaelzer> maybe I haven't got all of them yet
[12:43] <cpaelzer> thanks dmitriis, I just wanted to make sure I'm not missing something that was already discovered
[12:43] <dmitriis> @paelzer Right. Based on what's in your post it affects multiple helpers, so it's better if we fixed it for good.

And it might be worth to mention that for microstack main use case this is nowadays avoided by OpenVswitch/OVN providing the dns resolution. But we can’t be sure if under the covers more helper break e.g. as you see with apparmor.
Also others (e.g. myself) might face this for other use cases, I only started at microstack as an example that exists.

So I’d really be interested in a proper solution for this.

1 Like

My workaround wasn’t bad, I’ve found that I just missed one path.
So the replacement now looks like:

sed -i -e 's?%%LDPATH%%?/snap/$SNAPCRAFT_PROJECT_NAME/current/lib:&?' bin/ldpathwrapper;
sed -i -e 's?%%LDPATH%%?/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib:&?' bin/ldpathwrapper;
sed -i -e 's?%%LDPATH%%?/snap/$SNAPCRAFT_PROJECT_NAME/current/lib/$SNAPCRAFT_ARCH_TRIPLET:&?' bin/ldpathwrapper;
sed -i -e 's?%%LDPATH%%?/snap/$SNAPCRAFT_PROJECT_NAME/current/usr/lib/$SNAPCRAFT_ARCH_TRIPLET:&?' bin/ldpathwrapper;
sed -i -e 's?%%LDPATH%%?\$LD_LIBRARY_PATH?' bin/ldpathwrapper;

With that I got all the libvirt helpers, dnsmasq and virsh (was only effected if called from other contexts) working.

The usage of the single helper is good as well to avoid proliferation of such wrapper-files.

So -> immediate issue solved \o/
But long term I’d still be interested if there could be a snap/snapcraft provided generic fix to cover those indirect-calls-withotu-env to work.
Maybe a patched version of that would load LD_LIBRARY_PATH from a place that snapd can keep constant for the snap?

I’ve inserted myself into this Fridays snap-clinic call (mostly before I completed my workaround) there we can talk about potential long term fixes.

FYI - anything self-built that is called from a hook like install seems to be affected as well - same workaround helps.

@cpaelzer @ijohnson

I think I understand where the issue with using -rpath during linking could arise.

However, this is not directly related to the original post’s problem, since there the problem is with the binaries that are not built for the snap in question and which are exec-ed from a forked process without LD_LIBRARY_PATH passed as an argument to execve(2):

  • dnsmasq: the issue is that there is no DT_RUNPATH in its ELF because we are not building it. Therefore, it cannot find while is available from the core snap;
  • virt-aa-helper: forks and execs $SNAP/sbin/apparmor_parser which is also not built for the snap via snapcraft. The fork/exec chain is like this: libvirtd -> virt-aa-helper -> apparmor_parser. So neither RUNPATH nor LD_LIBRARY_PATH for libvirtd helps here.

Now for the -rpath-related topic.

Apparently at some point the use of --enable-new-dtags became the default in Debian and Ubuntu.

This resulted in a behavior change for the -rpath flag passed to the linker: instead of DT_RPATH we now get DT_RUNPATH present in the resulting ELFs. The effect of that is very subtle unless you have the right context.

DT_RUNPATH does not affect transient dependencies (see [1] - [5] below) which would be OK if we only relied on libraries built in the snap with provided ldflags or if the right versions of libraries were always available in the core snap.

However, some dependencies are included from debs which means that they do not have DR_RUNPATH specified.

To illustrate on the libvirt_leaseshelper example :

The only direct dependencies of the helper binary are:

 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []

But depends on which, in turn, depends on and some other libraries:

    730113: [0];  needed by /snap/microstack/x1/usr/lib/ [0]
    730113: [0];  generating link map
    730113:	  dynamic: 0x00007f6ce9b4d1c0  base: 0x00007f6ce9ac0000   size: 0x000000000008e628
    730113:	    entry: 0x00007f6ce9acead0  phdr: 0x00007f6ce9ac0040  phnum:                 11
# ...
    730113: [0];  needed by /snap/microstack/x1/usr/lib/x86_64-linux-gnu/ [0]
    730113: [0];  generating link map
    730113:	  dynamic: 0x00007f6ce92fae00  base: 0x00007f6ce92d3000   size: 0x00000000000280f0
    730113:	    entry: 0x00007f6ce92d81e0  phdr: 0x00007f6ce92d3040  phnum:                 11

which does not have RUNPATH specified in the ELF since we have not built it in the snap and included it from debs:

$ sudo readelf -d /snap/microstack/x1/usr/lib/x86_64-linux-gnu/
Dynamic section at offset 0x8c1c0 contains 42 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x0000000000000001 (NEEDED)             Shared library: []
 0x000000000000000e (SONAME)             Library soname: []
# ... but no RUNPATH here

To summarize, there are actually 2 problems from what I can see:

  1. Binaries included into the snap which are exec-ed by child processes without LD_LIBRARY_PATH being passed to them from the parent;
  2. Shared libraries included into the snap from debs (possibly with transient dependencies) and used from child processes that do not have LD_LIBRARY_PATH set.

"The set of directories specified by a given DT_RUNPATH entry is used to find only the immediate dependencies of the executable or shared object containing the DT_RUNPATH entry. That is, it is used only for those dependencies contained in the DT_NEEDED entries of the dynamic structure containing the DT_RUNPATH entry, itself. One object’s DT_RUNPATH entry does not affect the search for any other object’s dependencies.

"o Using the directories specified in the DT_RPATH dynamic section attribute of the binary if present and DT_RUNPATH attribute does not exist. Use of DT_RPATH is deprecated.

o Using the directories specified in the DT_RUNPATH dynamic section attribute of the binary if present. Such directories are searched only to find those objects required by DT_NEEDED (direct dependencies) entries and do not apply to those objects’ children, which must themselves have their own DT_RUNPATH entries. This is unlike DT_RPATH, which is applied to searches for all children in the dependency tree.

[3];a=blob;f=elf/dl-load.c;h=650e4edc35e5e582652c1167f4275a93e8c33120;hb=HEAD#l2033 _dl_map_object
[4];a=blob;f=sysdeps/generic/ldsodefs.h;h=9c15259236adab43aeaee66fa97743952d7f2589;hb=HEAD#l949 (_dl_map_object header comments)
[5];a=blob;f=elf/dl-load.c;h=650e4edc35e5e582652c1167f4275a93e8c33120;hb=HEAD#l2177 “Look at the RUNPATH information for this binary”