Classic snaps failing on Ubuntu 17.10

There’s actually a very cheap preliminary solution here: just replace the ELF files with a wrapper that explicitly calls the loader with a library path in the command line.

What’s the interpreter?

$ readelf -p .interp teleconsole | sed -n 's,.*/lib,/lib,p'
/lib64/ld-linux-x86-64.so.2

How will it resolve its libs?

$ /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --library-path /snap/core/current/lib/x86_64-linux-gnu --list ./teleconsole
        linux-vdso.so.1 =>  (0x00007ffc59945000)
        libpthread.so.0 => /snap/core/current/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fef4be5e000)
        libc.so.6 => /snap/core/current/lib/x86_64-linux-gnu/libc.so.6 (0x00007fef4ba94000)
        /lib64/ld-linux-x86-64.so.2 => /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 (0x000055fe1ab74000)

Does it work?

$ /snap/core/current/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 --library-path /snap/core/current/lib/x86_64-linux-gnu ./teleconsole
Starting local SSH server on localhost...
(...)

Yes, it does.

Would be worth an extra check to make sure --library-path is transitive, but it would be awkward for it to not be.

That is exactly what I was playing with without having the time to write it up (kid’s bedtime).
The thing about --library-path being transitive or not is interesting, because in some cases you would want it to be and in others you wouldn’t. Such is the case of teleconsole which creates a shell on the current system, you wouldn’t want to be affected by the library path whereas if you needed to exec something within the snap you might want to take advantage of a pre-existing --library-path setting and not prepend everything with ld-linux, such a case of this is gtk libraries spawning processes to load and retrieve information from other assets.

The case of teleconsole is also simple, electron is a bit more complicated as the ld-linux ... call would need to be added to the final call on the actual electron binary which an electron application is wrapped in.

About doing it all automatically, @zyga-snapd did some initial research on this a while ago (briefly explained in his blog post I mentioned earlier) and came back with the result of there being no way to easily patch INTERP without modifying the kernel (this is the half of work I mention is missing every 3 months, but we could of promoted helping people doing it manually).
About patching elf files with RPATH, we also looked into this, it is a lot of interesting work we would need to do to get this going.
About RUNPATH versus RPATH, from _dl_map_object in elf/dl-load.c:

Unless loading object has RUNPATH:
    RPATH of the loading object,
        then the RPATH of its loader (unless it has a RUNPATH), ...,
        until the end of the chain, which is either the executable
        or an object loaded by dlopen
    Unless executable has RUNPATH:
        RPATH of the executable
LD_LIBRARY_PATH
RUNPATH of the loading object
ld.so.cache
default dirs

which is why we use RPATH, to not leak the RUNPATH into whatever is called.

By the way, thanks for taking in an interest in this!

You want it transitive in all cases. When a different binary is executed, that’s not about transitivity anymore, as the process memory will be completely replaced and the linking procedure starts over again. That’s why all binaries need to be replaced by wrappers, not just the commands referenced by applications.

About patching the interpreter, there’s no reason to patch the kernel for that:

/tmp $ cat myld.go
package main

func main() {
        println("Hello there!")
}
/tmp $ CGO_ENABLED=0 go build myld.go
/tmp $ patchelf --set-interpreter /tmp/myld teleconsole
/tmp $ ./teleconsole
Hello there!

This is probably the right way to go. We can chain load the real ld-linux here:

/tmp $ cat myld.go
package main

import (
        "os"
        "syscall"
)

func main() {
        const ld = "/lib64/ld-linux-x86-64.so.2"
        err := syscall.Exec(ld, []string{ld, "--list", "--library-path", "/snap/core/current/lib/x86_64-linux-gnu", "/tmp/teleconsole"}, os.Environ())
        if err != nil {
                println("error: " + err.Error())
                os.Exit(1)
        }
}

Note I used –list above to demonstrate the idea below.

Then, using the teleconsole with the patched interpreter.

 /tmp $ ./teleconsole
        linux-vdso.so.1 =>  (0x00007ffdac6db000)
        libpthread.so.0 => /snap/core/current/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f082ed49000)
        libc.so.6 => /snap/core/current/lib/x86_64-linux-gnu/libc.so.6 (0x00007f082e97f000)
        /tmp/myld => /lib64/ld-linux-x86-64.so.2 (0x000055e1bccb3000)

This has the disadvantage that we patch all binaries, but it feels like a more polished approach, without any wrappers dangling around, no moves, and executions based on argv[0] should still work correctly.

We’ll need some data inside the ELF so we can tell what the original interpreter was. Or perhaps an individual ld for every interpreter required. Those can be tiny little programs (not Go) so perhaps simpler and not unreasonable.

That may also be better than just patching the ELF’s RPATH, because although RPATH would be simpler, the ELF would still point to an ld-linux interpreter outside the snap which will be different and may not even exist depending on local naming conventions adopted by the Linux distribution at hand.

Gut feeling is that this is two or three days of work… a week at most. What do you think?

That seems reasonable, thanks for patchelf btw, Almost a year ago @zyga-snapd’s and my google foo only found chrpath which had an important notice under BUGS in the manpage.

We already have logic to crawl the snap and find ELF binaries, so this should be rather trivial work indeed (again, thanks for patchelf, I wasn’t looking forward to writing logic to modify those headers).

With regards to myld, it might be best if each new base snap is required to provide a fixed entry point so we can patchelf with interp of something like /snap/<base>/current/lib/snap-ld-linux.

Let me expand on that last part as the original idea in your proposal was to have snapcraft create this little shim. So here’s why I would like it to be part of the base snap:

  • the entry point is clearly defined.
  • the base snap knows exactly what ld-linux to call.
  • if --library-path is the reason to keep it in snapcraft, can I suggest that LD_LIBRARY_PATH poped from the environment and used as the --library-path argument.
  • the base snap could leverage this snap-ld-linux to patchelf things from the base snap itself such as /usr/bin/python3

There’s no reason to mix this logic across snapd and snapcraft tying their exact implementation together and forcing every single base to ship with these custom loaders, and making classic work or not depending on whether the base author was aware of such edge cases. Snapcraft will need to patch the interpreters, and it knows exactly which interpreter to call because it has the old one at hand.

We must not touch LD_LIBRARY_PATH, or it will break the user’s environment in unrelated ways. We can build the default path dynamically very easily based on which libraries each ELF file is linked with. Make a set of all of them, search inside the snap for these names, build a path that resolves all of them, and inject that into the custom ld.

We can name the custom ld as $SNAP/lib/snap-<original name>, so we can make sense of it, and define the real ld as a constant inside the code.

One note on this one:

Indeed we may need to do something on our bases as well to fix their binaries. But note that the issue in this topic was raised precisely because the snap was calling out to binaries in the system instead of inside the base snap. My guess is that this is typical (/usr/bin/python3 is not inside the base for a classic snap).

@sergiusens Can we move this forward and put it in the agenda? Every classic snap today is sort of broken because of this, and changes in the upcoming glibc will make this issues a deal breaker. The sooner we fix that in snapcraft, the less broken snaps we’ll have.

it is on my top things to do, just yesterday we closed the day discussing the code design for this with the team.

2 Likes

Any update on this?

To make sure I understand, I think the plan here is to change the process of building a classic snap so that instead of having LDFLAGS set during the build step, something in some later step will patch any ELF binaries in the snap to have a custom interpreter. This interpreter will call the interpreter from the core^Wbase snap with a --list so that any dynamic libraries are resolved from the either the snap being built or the base snap.

This sounds like a good change to me and will let me simplify the go snap’s build plugin a whole bunch. When can I get it? :slight_smile:

1 Like

Being worked on now, just a bit complicated due to the current sprint.

1 Like

I think this is why the snapcraft snap doesn’t work on trusty as well, LP: #1723208.

Any news on this issue?
I’m still having issues with the ubuntu-make snap.
It just returns a segmentation fault.

Is there any update on this? I am running Ubuntu 17.10 and would like to install Android Studio, to do this I need to install the latest version of Ubuntu Make which can be installed through snap, however I am getting stuck here:

charlie@thinkpad:~$ snap install ubuntu-make --classic
ubuntu-make master from 'didrocks' installed
charlie@thinkpad:~$ ubuntu-make.umake android
Segmentation fault (core dumped)

If anybody could help me out with this I would greatly appreciate it!

snapcraft#1635 was merged last week, although not released yet, which is the fix for bug 1723208.

You can try it by installing snapcraft --edge --classic or, if you’ve already installed it switch to the edge channel with snap refresh snapcraft --edge --classic.

Be sure you’re using the snap if you also have the apt package installed.

$ which snapcraft
/snap/bin/snapcraft
2 Likes

6 posts were split to a new topic: Creating a snap for ubuntu-make

Hey folks, quick update here on the progress we’ve made.

A couple of short-term fixes have landed that make progress on this issue, but don’t completely fix it:

  • LP: #1723208 has been fixed in master, but that did not end up being due to this issue, but rather a leaking LD_LIBRARY_PATH in the snapcraft snap itself.
  • One of the issues here is that Snapcraft doesn’t exclude libraries from the snap unless the build host is a release that corresponds to a core snap, i.e. currently only Xenial. On releases where this isn’t the case, snapcraft took the step of at least ensuring libc.so.6 didn’t make it into the snap, but that of course ignores everything else in libc, so that measure never worked. PR: #1632 landed in master fixing this issue by making sure every lib in the libc package was excluded.

Another fix in the pipline is PR: #1636 where we get a little smarter about determining the host OS. As soon as that fix lands, combined with the other two, once Snapcraft (a classic snap that we’re using to prove this out) is rebuilt with them we have a snap that runs on Trusty.

That, of course, goes the wrong direction for some of you (older releases rather than newer ones). We’re making progress on that as well, but the problems there take longer to solve (see earlier in this thread). It is a top priority, but the poor soul doing this work just had two sprints and presented at a conference, heh. Expect to hear more soon.

@sergiusens How’s the change that patches the interpreter on classic snaps going?

Getting there. We want to release 2.34 first as it has become huge and the changes for this require some code changes for which we want to have a solid baseline to come back to. So my estimate is, if adt starts working on our release branch for 2.34 for it to happen 1 week from now.

1 Like

@sergiusens Any news about this issue?