EGL-using snaps on impish seem to be broken when using the Nvidia proprietary driver

Hi, yes @ijohnson, I can confirm exactly the same output as you. I’d guess that it picks up this dependency when build against >= 2.33. I rebuilt the impish version on focal and it got Depends: libc6 (>= 2.14) instead of 2.33 there, which seems to back that up.

I guess the general problem is that we’re bind mounting but not necessarily bringing over the deps (in an ldd sense) of the things we are bind mounting. It feels like this approach is kind of dodgy in the situation we’re in. We could maybe mitigate it by always building the stuff for the earliest supported series and then binary-copying upwards. But perhaps a re-evaluation is required and something where we bring in a copy of the same drivers from the store, built against the right toolchain, would be more robust?

Yes I agree this approach is dodgy indeed in light of this problem, I wasn’t originally involved with the decision to do it this way, but I think the reason it was deemed okay to do this was that there was not any new dependencies introduced like this and so “it just worked”. That of course is not a justification alone for doing it, but it probably made more sense at the time than just not supporting it

1 Like

So I also had a look at the nvidia libraries for at driver version 460 and 470 (via installing libnvidia-gl-460 and then uninstalling that one and installing libnvidia-gl-470-server in my impish container), and all those libraries from NVIDIA that we repackage as debian packages (IIRC this is how it works) files seem to be okay, none of them have new libc6 dependencies. So it’s just the things we actually build in the archive, which suggests the sky may not be falling right at this minute.

460:

root@407cd95c6adc:/# for f in $(ls /usr/lib/x86_64-linux-gnu/lib*nvidia*); do echo "$f:"; objdump -T $f | grep -Po 'GLIBC_\K[0-9.]+' | sort --version-sort | uniq; done
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0:
2.2.5
2.3.2
2.3.3
2.4
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.460.80:
2.2.5
2.3.2
2.3.3
2.4
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.460.80:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.460.80:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.460.80:
2.2.5
/usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.460.80:
2.2.5
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.460.80:
2.2.5
2.3
2.3.3
2.7
2.9
2.10
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.460.80:
2.2.5
2.3
2.3.3
2.7
2.9
2.10
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.460.80:
2.2.5
2.3
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.460.80:
2.2.5
2.3
2.3.2
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1:
2.2.5
2.3
2.3.4
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.460.80:
2.2.5
2.3
2.3.4
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.460.80:
2.2.5
2.3
2.3.2
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.80:
2.2.5

470:

for f in $(ls /usr/lib/x86_64-linux-gnu/lib*nvidia*); do echo "$f:"; objdump -T $f | grep -Po 'GLIBC_\K[0-9.]+' | sort --version-sort | uniq; done
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0:
2.2.5
2.3.2
2.3.3
2.4
/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.470.57.02:
2.2.5
2.3.2
2.3.3
2.4
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.470.57.02:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.470.57.02:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0:
2.2.5
/usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.470.57.02:
2.2.5
/usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.470.57.02:
2.2.5
2.3
2.3.2
2.3.3
2.4
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.470.57.02:
2.2.5
2.3
2.3.3
2.7
2.9
2.10
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.470.57.02:
2.2.5
2.3
2.3.3
2.7
2.9
2.10
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.470.57.02:
2.2.5
2.3
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.470.57.02:
2.2.5
2.3
2.3.2
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1:
2.2.5
2.3
2.3.4
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.470.57.02:
2.2.5
2.3
2.3.4
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.470.57.02:
2.2.5
2.3
2.3.2
2.7
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.470.57.02:
2.2.5

FWIW we worked out something that allowed us to separate the graphics userspace into a separate snap:

It is a basis for resolving the issue at hand, but Nvidia has one more complicating bit, and that is it’s bound to the kernel module. So a nvidia-core20 snap would need to carry all the supported versions of the userspace and mount the appropriate one, or for there to be tracks (¯\_(ツ)_/¯) for all the supported versions. Any case, Snapd has to learn how to deal with that.

3 Likes

This was @zyga’s thoughts about this kind of problem from 2 years back:

I don’t think any of those ideas actually got implemented though.

For the Nvidia case in particular, we might be able to get by in the short term making the last compatible user space available to snaps built on top of 16.04 libraries. But the Nvidia drivers have historically had a close coupling between the kernel and user space portions: there’s no guarantee how long that would continue to work as the host system moves ever forward.

1 Like

Is there any work going on in this area? That’s a very pressing problem, isn’t it? Snaps with older bases cannot satisfy this dependency and just refuse to launch.

See e.g. How to set LD_LIBRARY_PATH properly?

3 Likes

Having gained a better understanding after reading the above threads, I do concur: isn’t this a Very Big Problem? Sounds like this could prevent us from shipping an EGL-using snap entirely.

1 Like

As a desperate measure, I added the graphics-core20 interface and bundled the EGL libs from the system and it seems to work. Probably by accident, but hopefully that helps someone.

What is this? - EDIT Ignore this question. I see the reference above :point_up:

My workaround is to coerce LD_LIBRARY_PATH in a wrapper that is last in the command-chain. The wrapper pushes ${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET} to the front of LD_LIBRARY_PATH.

The OBS snap now starts on my NVIDIA systems but I am not sure how thin is the ice upon which I skate :ice_skate:

1 Like

Yeah I think that ultimately has the same effect. I also have the same feeling, this feels like a potential trouble source.

I’d like to thank @alan_g and other members of the Mir team for working on graphics-core20 :heart:

I’ve added graphics-core20 to a local branch of the OBS Studio snap, the diff is included below as it might be useful to others.

One thing to note, is I had to expose the environment variable coercion in an existing wrapper I use to launch OBS, as they didn’t take affect when added to environment: stanzas in the snapcraft.yaml. The wrapper, is also the last script in the command-chain. Here’s the diff:

diff --git a/snap/local/obs-wrapper b/snap/local/obs-wrapper
index 06ceb43..9936fc5 100755
--- a/snap/local/obs-wrapper
+++ b/snap/local/obs-wrapper
@@ -33,5 +33,11 @@ if [[ ${@} == *"usr/bin/obs"* ]]; then
   fi
 fi
 
+# Support for graphics-core20
+export LD_LIBRARY_PATH="${SNAP}/graphics/lib:${LD_LIBRARY_PATH}"
+export LIBGL_DRIVERS_PATH="${SNAP}/graphics/dri"
+export LIBVA_DRIVERS_PATH="${SNAP}/graphics/dri"
+export __EGL_VENDOR_LIBRARY_DIRS="${SNAP}/graphics/glvnd/egl_vendor.d"
+
 unset SESSION_MANAGER
 exec "${@}"
diff --git a/snap/snapcraft.yaml b/snap/snapcraft.yaml
index d017bac..10df27b 100644
--- a/snap/snapcraft.yaml
+++ b/snap/snapcraft.yaml
@@ -9,6 +9,12 @@ architectures:
 compression: lzo
 
 plugs:
+  # Support for graphics-core20
+  # https://discourse.ubuntu.com/t/the-graphics-core20-snap-interface/23000
+  graphics-core20:
+    interface: content
+    target: $SNAP/graphics
+    default-provider: mesa-core20
   # Support for common GTK themes
   # https://forum.snapcraft.io/t/how-to-use-the-system-gtk-theme-via-the-gtk-common-themes-snap/6235
   gtk-3-themes:
@@ -43,8 +49,12 @@ layout:
     symlink: $SNAP/usr/lib/$SNAPCRAFT_ARCH_TRIPLET/libvulkan_radeon.so
   /usr/share/alsa:
     symlink: $SNAP/usr/share/alsa
-  /usr/share/libdrm/amdgpu.ids:
-    symlink: $SNAP/usr/share/libdrm/amdgpu.ids
+  # Used by mesa-core20 for app specific workarounds
+  /usr/share/drirc.d:
+    bind: $SNAP/graphics/drirc.d
+    # Needed by mesa-core20 on AMD GPUs
+  /usr/share/libdrm:
+    bind: $SNAP/graphics/libdrm
   /usr/share/obs:
     symlink: $SNAP/usr/share/obs
   /usr/share/X11:
@@ -1159,6 +1169,8 @@ parts:
 
   cleanup:
     plugin: nil
+    build-snaps:
+      - mesa-core20
     after:
       - aom
       - cef
@@ -1180,11 +1192,14 @@ parts:
         usr/share/GConf \
         usr/share/apport \
         usr/share/bug \
+        usr/share/drirc.d \
         usr/share/fonts \
+        usr/share/glvnd \
         usr/share/icons/Adwaita \
         usr/share/icons/Humanity* \
         usr/share/icons/LoginIcons \
         usr/share/icons/ubuntu-mono-* \
+        usr/share/libdrm \
         usr/share/lintian \
         usr/share/man \
         usr/share/pkgconfig; do
@@ -1195,3 +1210,8 @@ parts:
       rm -rf ${SNAPCRAFT_PRIME}/usr/share/doc/*/examples || true
       rm ${SNAPCRAFT_PRIME}/usr/share/doc/*/README* 2>/dev/null || true
       find ${SNAPCRAFT_PRIME}/usr -type d -empty -delete || true
+
+      # graphics-core20 cleanup
+      cd /snap/mesa-core20/current/egl/lib
+      find . -type f,l -exec rm -f $SNAPCRAFT_PRIME/usr/lib/${SNAPCRAFT_ARCH_TRIPLET}/{} \;
+      rm -fr "$SNAPCRAFT_PRIME/usr/lib/${SNAPCRAFT_ARCH_TRIPLET}/dri"

I was also able to get the OBS Studio snap running simply pushing ${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET} to the front of LD_LIBRARY_PATH. Here’s the diff.

diff --git a/snap/local/obs-wrapper b/snap/local/obs-wrapper
index 06ceb43..da2e38d 100755
--- a/snap/local/obs-wrapper
+++ b/snap/local/obs-wrapper
@@ -33,5 +33,7 @@ if [[ ${@} == *"usr/bin/obs"* ]]; then
   fi
 fi
 
+export LD_LIBRARY_PATH="${SNAP}/usr/lib/${SNAP_LAUNCHER_ARCH_TRIPLET}:${LD_LIBRARY_PATH}"
+
 unset SESSION_MANAGER
 exec "${@}"

My question for @alan_g @ijohnson and @jamesh is which of the above approaches is the most robust?

1 Like

I’m not sure what the graphics-core20 interface has to do with the Nvidia/Impish problems. Admittedly, libEGL.so is involved in both but that’s no different to e.g. having it included in the snap. (Which it may well be already.)

I assume some other script in the command-chain or snapd is making the change. AIIUI snapd prepends the host Nvidia driver path so that binaries from there are found first. But unfortunately, with a core20 based snap these are incompatible with the base core20 libc.

Adding Mesa drivers might work for some cases, but I doubt that, for example, hardware decoding of video will be working.

None of this sounds particularly robust, but if it works for you, then great!

I’ve no deep knowledge of when and how snapd injects host GL binaries into the environment, but wouldn’t stripping them from LD_LIBRARY_PATH/LIBGL_DRIVERS_PATH/LIBVA_DRIVERS_PATH/__EGL_VENDOR_LIBRARY_DIRS in your wrapper script be simpler and as effective?

I don’t have strong opinions here except to say that snapd shouldn’t do anything with LD_LIBRARY_PATH for apps at runtime except that we clean the value from the host before executing snap-confine, so i.e. doing LD_LIBRARY_PATH=foo snap run foobar, foobar when executed will not see the LD_LIBRARY_PATH value the same way it will for something like DISPLAY etc

$ LD_LIBRARY_PATH=foo snap run --shell hello-world -c 'echo $LD_LIBRARY_PATH'

$ LD_LIBRARY_PATH2=foo snap run --shell hello-world -c 'echo $LD_LIBRARY_PATH2'
foo
1 Like

Historically it’s been snapcraft-desktop-helpers that set up the environment:

It’s likely part of most desktop snaps.

1 Like

I don’t know where it comes from, but something is prefixing /var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void: to LD_LIBRARY_PATH even in snaps that avoid snapcraft-desktop-helpers.

E.g.

$ snap run --shell ubuntu-frame -c "echo \$LD_LIBRARY_PATH"
...
/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/ubuntu-frame/489/graphics/lib:/snap/ubuntu-frame/489/usr/lib:/snap/ubuntu-frame/489/usr/lib/x86_64-linux-gnu

I don’t see it anywhere in the snap itself:

$ grep -R gl32 /snap/ubuntu-frame/current/
$

[update]

After poking around the snapd code it comes from SNAP_LIBRARY_PATH via snapcraft-runner:

$ grep -R SNAP_LIBRARY_PATH /snap/ubuntu-frame/current/
/snap/ubuntu-frame/current/snap/command-chain/snapcraft-runner:export LD_LIBRARY_PATH="$SNAP_LIBRARY_PATH${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
3 Likes

@Wimpress starting from your original snap recipe, does adding this to your wrapper script help:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH#$SNAP_LIBRARY_PATH:}

(It is probably better not to do this on systems where the Nvidia drivers work, but a useful data point.)

1 Like

Yes, I am using the helpers from the GNOME extension.

The opengl interface in snapd does that:

Note that while snapd creates those files / mounts, it doesn’t set any env vars related to those

1 Like

Upgraded my system at home to 21.10, which is also NVIDIA only.

Built the OSB Studio snap (actually a test version called obs-demo) locally from my original yaml and added export LD_LIBRARY_PATH=${LD_LIBRARY_PATH#$SNAP_LIBRARY_PATH:} to the wrapper just before the exec.

OBS Studio is working.

I grabbed the LD_LIBRARY_PATH, before and after adding that export to the wrapper. This is the result. I’ve highlighted what is removed from the Before:

  • Before: /var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl32:/var/lib/snapd/void:/snap/obs-demo/x18/opt/qt515/lib::/snap/obs-demo/x18/lib:/snap/obs-demo/x18/usr/lib:/snap/obs-demo/x18/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu:/snap/obs-demo/x18/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib:/snap/obs-demo/x18/lib:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu/dri:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl/vdpau:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu/pulseaudio

  • After: /snap/obs-demo/x18/opt/qt515/lib::/snap/obs-demo/x18/lib:/snap/obs-demo/x18/usr/lib:/snap/obs-demo/x18/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu:/snap/obs-demo/x18/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu:/snap/obs-demo/x18/usr/lib:/snap/obs-demo/x18/lib:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu/dri:/var/lib/snapd/lib/gl:/var/lib/snapd/lib/gl/vdpau:/snap/obs-demo/x18/usr/lib/x86_64-linux-gnu/pulseaudio

The NVIDIA drivers still work and that is due to lines 94 - 107 of the desktop-exports from the GNOME extension in Snapcraft:

@ijohnson What is /var/lib/snapd/void?

1 Like