laney@raleigh> snap run flokk-contacts
/snap/flokk-contacts/11/flokk-contacts: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /var/lib/snapd/lib/gl/libEGL.so.1)
and the snap doesn’t run.
The problem seems to be that libEGL.so.1 is bind mounted into the snap’s namespace under /var/lib/snapd/lib/gl/, but not its dependencies. If you see:
If indeed there are new dependencies from the host we need to be bind mounting as well, we can certainly add those to the list, it would be good to know what the full list is
Ah yes sorry I had missed that detail. For libc6 itself, no we definitely do not want to bring in libc6 from the host into the mount namespace.
Thinking back about our solution to this problem, I’m actually a bit surprised we didn’t run into this problem sooner, since I’m not sure that we can continue to bind mount drivers/graphics libraries from the host into the namespace if said drivers/libraries will have dependencies which cannot be satisfied by the base snap or the snap itself. I will discuss with the team, but this is quite unfortunate…
@laney can you check if the nvidia libs from your host that are being bind mounted into the host also have this dependency on the newer libc6? In a dev impish container it looks like libEGL.so only has a symbol dependency on fstat64 from the newer libc6:
Hi, yes @ijohnson, I can confirm exactly the same output as you. I’d guess that it picks up this dependency when build against >= 2.33. I rebuilt the impish version on focal and it got Depends: libc6 (>= 2.14) instead of 2.33 there, which seems to back that up.
I guess the general problem is that we’re bind mounting but not necessarily bringing over the deps (in an ldd sense) of the things we are bind mounting. It feels like this approach is kind of dodgy in the situation we’re in. We could maybe mitigate it by always building the stuff for the earliest supported series and then binary-copying upwards. But perhaps a re-evaluation is required and something where we bring in a copy of the same drivers from the store, built against the right toolchain, would be more robust?
Yes I agree this approach is dodgy indeed in light of this problem, I wasn’t originally involved with the decision to do it this way, but I think the reason it was deemed okay to do this was that there was not any new dependencies introduced like this and so “it just worked”. That of course is not a justification alone for doing it, but it probably made more sense at the time than just not supporting it
So I also had a look at the nvidia libraries for at driver version 460 and 470 (via installing libnvidia-gl-460 and then uninstalling that one and installing libnvidia-gl-470-server in my impish container), and all those libraries from NVIDIA that we repackage as debian packages (IIRC this is how it works) files seem to be okay, none of them have new libc6 dependencies. So it’s just the things we actually build in the archive, which suggests the sky may not be falling right at this minute.
FWIW we worked out something that allowed us to separate the graphics userspace into a separate snap:
It is a basis for resolving the issue at hand, but Nvidia has one more complicating bit, and that is it’s bound to the kernel module. So a nvidia-core20 snap would need to carry all the supported versions of the userspace and mount the appropriate one, or for there to be tracks (¯\_(ツ)_/¯) for all the supported versions. Any case, Snapd has to learn how to deal with that.
This was @zyga’s thoughts about this kind of problem from 2 years back:
I don’t think any of those ideas actually got implemented though.
For the Nvidia case in particular, we might be able to get by in the short term making the last compatible user space available to snaps built on top of 16.04 libraries. But the Nvidia drivers have historically had a close coupling between the kernel and user space portions: there’s no guarantee how long that would continue to work as the host system moves ever forward.
Is there any work going on in this area? That’s a very pressing problem, isn’t it? Snaps with older bases cannot satisfy this dependency and just refuse to launch.
Having gained a better understanding after reading the above threads, I do concur: isn’t this a Very Big Problem? Sounds like this could prevent us from shipping an EGL-using snap entirely.
As a desperate measure, I added the graphics-core20 interface and bundled the EGL libs from the system and it seems to work. Probably by accident, but hopefully that helps someone.
What is this? - EDIT Ignore this question. I see the reference above
My workaround is to coerce LD_LIBRARY_PATH in a wrapper that is last in the command-chain. The wrapper pushes ${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET} to the front of LD_LIBRARY_PATH.
The OBS snap now starts on my NVIDIA systems but I am not sure how thin is the ice upon which I skate
I’d like to thank @alan_g and other members of the Mir team for working on graphics-core20
I’ve added graphics-core20 to a local branch of the OBS Studio snap, the diff is included below as it might be useful to others.
One thing to note, is I had to expose the environment variable coercion in an existing wrapper I use to launch OBS, as they didn’t take affect when added to environment: stanzas in the snapcraft.yaml. The wrapper, is also the last script in the command-chain. Here’s the diff:
I was also able to get the OBS Studio snap running simply pushing ${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET} to the front of LD_LIBRARY_PATH. Here’s the diff.
diff --git a/snap/local/obs-wrapper b/snap/local/obs-wrapper
index 06ceb43..da2e38d 100755
--- a/snap/local/obs-wrapper
+++ b/snap/local/obs-wrapper
@@ -33,5 +33,7 @@ if [[ ${@} == *"usr/bin/obs"* ]]; then
fi
fi
+export LD_LIBRARY_PATH="${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET}:${LD_LIBRARY_PATH}"
+
unset SESSION_MANAGER
exec "${@}"
My question for @alan_g@ijohnson and @jamesh is which of the above approaches is the most robust?
I’m not sure what the graphics-core20 interface has to do with the Nvidia/Impish problems. Admittedly, libEGL.so is involved in both but that’s no different to e.g. having it included in the snap. (Which it may well be already.)
I assume some other script in the command-chain or snapd is making the change. AIIUI snapd prepends the host Nvidia driver path so that binaries from there are found first. But unfortunately, with a core20 based snap these are incompatible with the base core20 libc.
Adding Mesa drivers might work for some cases, but I doubt that, for example, hardware decoding of video will be working.
None of this sounds particularly robust, but if it works for you, then great!
I’ve no deep knowledge of when and how snapd injects host GL binaries into the environment, but wouldn’t stripping them from LD_LIBRARY_PATH/LIBGL_DRIVERS_PATH/LIBVA_DRIVERS_PATH/__EGL_VENDOR_LIBRARY_DIRS in your wrapper script be simpler and as effective?
I don’t have strong opinions here except to say that snapd shouldn’t do anything with LD_LIBRARY_PATH for apps at runtime except that we clean the value from the host before executing snap-confine, so i.e. doing LD_LIBRARY_PATH=foo snap run foobar, foobar when executed will not see the LD_LIBRARY_PATH value the same way it will for something like DISPLAY etc
$ LD_LIBRARY_PATH=foo snap run --shell hello-world -c 'echo $LD_LIBRARY_PATH'
$ LD_LIBRARY_PATH2=foo snap run --shell hello-world -c 'echo $LD_LIBRARY_PATH2'
foo