Classic snaps failing on Ubuntu 17.10

That is what I grabbed from the commad.*.wrapper that was executed:

 sergiusens  ~  cat /snap/teleconsole/current/command-teleconsole.wrapper 
#!/bin/sh
export LD_LIBRARY_PATH="/snap/core/current/lib:/snap/core/current/usr/lib:/snap/core/current/lib/x86_64-linux-gnu:/snap/core/current/usr/lib/x86_64-linux-gnu"
export PATH="$SNAP/usr/sbin:$SNAP/usr/bin:$SNAP/sbin:$SNAP/bin:$PATH"

LD_LIBRARY_PATH=$SNAP_LIBRARY_PATH:$LD_LIBRARY_PATH
exec "$SNAP/teleconsole" "$@"

I stand by my statement - the two LD_LIBRARY_PATHs you listed against the ldd command differ in more than just the item you said you removed.

@sergiusens You’re saying that the only way to make it work is to recompile, while at the same time saying that once you change a dynamic path to a C library the software breaks. Wouldn’t that imply that the path is being dynamically defined, and that the library on the system of choice happens to satisfy the requirements of the pre-built binary while the one in core doesn’t?

In the interest of actually understanding the problem:

  1. Which specific libc works
  2. Which specific libc breaks
  3. Why would ldd not respect LD_LIBRARY_PATH
  4. Why would that libc not work if bundled in the snap
  5. Bonus question: which change in libc caused the issue

Please do not recommend using the system libraries in classic snaps. There’s no such thing as system libraries if your project is supposed to run on a dozen Linux distributions.

Oh, well you put emphasis on the adding part, that is just a typo :slight_smile:

 sergiusens  ~  snap run --shell teleconsole
 sergiusens  ~  LD_LIBRARY_PATH="/snap/core/current/lib:/snap/core/current/usr/lib:/snap/core/current/usr/lib/x86_64-linux-gnu" ldd $SNAP/teleconsole
	linux-vdso.so.1 =>  (0x00007ffe359fc000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd5345da000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd5341fa000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd5347f9000)
 sergiusens  ~  LD_LIBRARY_PATH="/snap/core/current/lib:/snap/core/current/usr/lib:/snap/core/current/lib/x86_64-linux-gnu:/snap/core/current/usr/lib/x86_64-linux-gnu" ldd $SNAP/teleconsole
Segmentation fault

It’s late here so I will just take a crank at this one today

ldd segfaults because it is respecting LD_LIBRARY_PATH, just like ls does in the examples above. This is the other problem with classic confinement, we should not be using LD_LIBRARY_PATH at all (which is what we do in one of those PRs linked in that bug) as it leaks into the environment of all its child processes,

if we do not use LD_LIBRARY_PATH then how do we use libraries from the Core snap or from within our own filesystem?

rpath, here’s a detailed read https://new.zygoon.pl/post/state-of-classic-confinement/

@sergiusens That’s quite clearly not enough, per this thread. You’re recommending to people that they simply link with the system library, while at the same time saying this will break down on 14.04. This will only get worse.

Let’s please design a proper method for linking to the right libraries, at least in situations like the one above that clearly could be solved by loading a proper libc.

Hi, am not working today do haven’t read on detail, but could the problem
be that the go tooling does not respect LDFLAGS? I have to take special
measures in the go snap to get the dynamic binaries to link correctly and I
guess snapcraft’s go plugin needs to too.

@mwhudson The problem is that whatever ships in the classic snap needs to link with libraries contained in the snap itself, while any external ELF binaries executed by the classic snap continue linking with their usual content outside of the snap. In that sense I don’t think dynamic Go binaries would be special: they’re still following the usual rules of ld-linux.

It doesn’t sound too hard to solve, in principle, but it will require some creativity in terms of establishing that boundary in just the right way so that internal links to internal and external links to external. It may involve patching RPATH on existing binaries (or RUNPATH, but RPATH sounds more appropriate here as it’s transitive), or alternatively using our own ld that does some work before handing off onto the real one. This might end up being a nicer option if we find a way to convince binaries inside the snap to use a different interpreter without patching INTERP onto the binary.

I think I have a correct solution, using the correct ld-linux and libraries from core, essentially matching ld-linux and the libc required to run. Will post later in the night.

There’s actually a very cheap preliminary solution here: just replace the ELF files with a wrapper that explicitly calls the loader with a library path in the command line.

What’s the interpreter?

$ readelf -p .interp teleconsole | sed -n 's,.*/lib,/lib,p'
/lib64/ld-linux-x86-64.so.2

How will it resolve its libs?

$ /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --library-path /snap/core/current/lib/x86_64-linux-gnu --list ./teleconsole
        linux-vdso.so.1 =>  (0x00007ffc59945000)
        libpthread.so.0 => /snap/core/current/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef4be5e000)
        libc.so.6 => /snap/core/current/lib/x86_64-linux-gnu/libc.so.6 (0x00007fef4ba94000)
        /lib64/ld-linux-x86-64.so.2 => /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 (0x000055fe1ab74000)

Does it work?

$ /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --library-path /snap/core/current/lib/x86_64-linux-gnu ./teleconsole
Starting local SSH server on localhost...
(...)

Yes, it does.

Would be worth an extra check to make sure --library-path is transitive, but it would be awkward for it to not be.

That is exactly what I was playing with without having the time to write it up (kid’s bedtime).
The thing about --library-path being transitive or not is interesting, because in some cases you would want it to be and in others you wouldn’t. Such is the case of teleconsole which creates a shell on the current system, you wouldn’t want to be affected by the library path whereas if you needed to exec something within the snap you might want to take advantage of a pre-existing --library-path setting and not prepend everything with ld-linux, such a case of this is gtk libraries spawning processes to load and retrieve information from other assets.

The case of teleconsole is also simple, electron is a bit more complicated as the ld-linux ... call would need to be added to the final call on the actual electron binary which an electron application is wrapped in.

About doing it all automatically, @zyga-snapd did some initial research on this a while ago (briefly explained in his blog post I mentioned earlier) and came back with the result of there being no way to easily patch INTERP without modifying the kernel (this is the half of work I mention is missing every 3 months, but we could of promoted helping people doing it manually).
About patching elf files with RPATH, we also looked into this, it is a lot of interesting work we would need to do to get this going.
About RUNPATH versus RPATH, from _dl_map_object in elf/dl-load.c:

Unless loading object has RUNPATH:
    RPATH of the loading object,
        then the RPATH of its loader (unless it has a RUNPATH), ...,
        until the end of the chain, which is either the executable
        or an object loaded by dlopen
    Unless executable has RUNPATH:
        RPATH of the executable
LD_LIBRARY_PATH
RUNPATH of the loading object
ld.so.cache
default dirs

which is why we use RPATH, to not leak the RUNPATH into whatever is called.

By the way, thanks for taking in an interest in this!

You want it transitive in all cases. When a different binary is executed, that’s not about transitivity anymore, as the process memory will be completely replaced and the linking procedure starts over again. That’s why all binaries need to be replaced by wrappers, not just the commands referenced by applications.

About patching the interpreter, there’s no reason to patch the kernel for that:

/tmp $ cat myld.go
package main

func main() {
        println("Hello there!")
}
/tmp $ CGO_ENABLED=0 go build myld.go
/tmp $ patchelf --set-interpreter /tmp/myld teleconsole
/tmp $ ./teleconsole
Hello there!

This is probably the right way to go. We can chain load the real ld-linux here:

/tmp $ cat myld.go
package main

import (
        "os"
        "syscall"
)

func main() {
        const ld = "/lib64/ld-linux-x86-64.so.2"
        err := syscall.Exec(ld, []string{ld, "--list", "--library-path", "/snap/core/current/lib/x86_64-linux-gnu", "/tmp/teleconsole"}, os.Environ())
        if err != nil {
                println("error: " + err.Error())
                os.Exit(1)
        }
}

Note I used –list above to demonstrate the idea below.

Then, using the teleconsole with the patched interpreter.

 /tmp $ ./teleconsole
        linux-vdso.so.1 =>  (0x00007ffdac6db000)
        libpthread.so.0 => /snap/core/current/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f082ed49000)
        libc.so.6 => /snap/core/current/lib/x86_64-linux-gnu/libc.so.6 (0x00007f082e97f000)
        /tmp/myld => /lib64/ld-linux-x86-64.so.2 (0x000055e1bccb3000)

This has the disadvantage that we patch all binaries, but it feels like a more polished approach, without any wrappers dangling around, no moves, and executions based on argv[0] should still work correctly.

We’ll need some data inside the ELF so we can tell what the original interpreter was. Or perhaps an individual ld for every interpreter required. Those can be tiny little programs (not Go) so perhaps simpler and not unreasonable.

That may also be better than just patching the ELF’s RPATH, because although RPATH would be simpler, the ELF would still point to an ld-linux interpreter outside the snap which will be different and may not even exist depending on local naming conventions adopted by the Linux distribution at hand.

Gut feeling is that this is two or three days of work… a week at most. What do you think?

That seems reasonable, thanks for patchelf btw, Almost a year ago @zyga-snapd’s and my google foo only found chrpath which had an important notice under BUGS in the manpage.

We already have logic to crawl the snap and find ELF binaries, so this should be rather trivial work indeed (again, thanks for patchelf, I wasn’t looking forward to writing logic to modify those headers).

With regards to myld, it might be best if each new base snap is required to provide a fixed entry point so we can patchelf with interp of something like /snap/<base>/current/lib/snap-ld-linux.

Let me expand on that last part as the original idea in your proposal was to have snapcraft create this little shim. So here’s why I would like it to be part of the base snap:

  • the entry point is clearly defined.
  • the base snap knows exactly what ld-linux to call.
  • if --library-path is the reason to keep it in snapcraft, can I suggest that LD_LIBRARY_PATH poped from the environment and used as the --library-path argument.
  • the base snap could leverage this snap-ld-linux to patchelf things from the base snap itself such as /usr/bin/python3

There’s no reason to mix this logic across snapd and snapcraft tying their exact implementation together and forcing every single base to ship with these custom loaders, and making classic work or not depending on whether the base author was aware of such edge cases. Snapcraft will need to patch the interpreters, and it knows exactly which interpreter to call because it has the old one at hand.

We must not touch LD_LIBRARY_PATH, or it will break the user’s environment in unrelated ways. We can build the default path dynamically very easily based on which libraries each ELF file is linked with. Make a set of all of them, search inside the snap for these names, build a path that resolves all of them, and inject that into the custom ld.

We can name the custom ld as $SNAP/lib/snap-<original name>, so we can make sense of it, and define the real ld as a constant inside the code.

One note on this one:

Indeed we may need to do something on our bases as well to fix their binaries. But note that the issue in this topic was raised precisely because the snap was calling out to binaries in the system instead of inside the base snap. My guess is that this is typical (/usr/bin/python3 is not inside the base for a classic snap).

@sergiusens Can we move this forward and put it in the agenda? Every classic snap today is sort of broken because of this, and changes in the upcoming glibc will make this issues a deal breaker. The sooner we fix that in snapcraft, the less broken snaps we’ll have.

it is on my top things to do, just yesterday we closed the day discussing the code design for this with the team.

2 Likes