Nvidia acceleration on chrome and firefox

Something I totally didn’t notice:

[Mar21 06:56] traps: glxinfo[29277] general protection ip:7ffff7559fb4 sp:7fffffffde78 error:0 in libc-2.23.so[7ffff7445000+1c0000]
[Mar21 07:05] traps: glxinfo[30174] general protection ip:7ffff7559fb4 sp:7fffffffde78 error:0 in libc-2.23.so[7ffff7445000+1c0000]
[Mar21 07:20] traps: glxinfo[31424] general protection ip:7ffff7559fb4 sp:7fffffffde78 error:0 in libc-2.23.so[7ffff7445000+1c0000]

Small update. Per @niemeyer’s advice to try a simpler approach, I’ve set up a xenial chroot, dumped the libglvnd I built before and copied over nvidia drivers that came from Arch packages. I was able to reproduce the segfault without much trouble. The backtrace:

(gdb) bt
#0  0x00007ffff7559fb4 in pthread_mutex_lock (mutex=0x7ffff71e5180 <dispatchLock>) at forward.c:192
#1  0x00007ffff6f5eddb in mt_mutex_lock (mutex=0x7ffff71e5180 <dispatchLock>) at glvnd_pthread.c:317
#2  0x00007ffff6f23f77 in LockDispatch () at GLdispatch.c:144
#3  0x00007ffff6f24115 in __glDispatchNewVendorID () at GLdispatch.c:198
#4  0x00007ffff7212607 in __glXLookupVendorByName (vendorName=0x618ad0 "nvidia") at libglxmapping.c:442
#5  0x00007ffff7213811 in __glXLookupVendorByScreen (dpy=0x60aab0, screen=0) at libglxmapping.c:574
#6  0x00007ffff7213966 in __glXGetDynDispatch (dpy=0x60aab0, screen=0) at libglxmapping.c:608
#7  0x00007ffff7209563 in glXChooseVisual (dpy=0x60aab0, screen=0, attrib_list=0x609200) at libglx.c:215
#8  0x00007ffff7b89d58 in glXChooseVisual (dpy=0x60aab0, screen=0, attribList=0x609200) at g_libglglxwrapper.c:183
#9  0x0000000000401741 in ?? ()
#10 0x00007ffff7465830 in __libc_start_main (main=0x401630, argc=1, argv=0x7fffffffe608, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe5f8) at ../csu/libc-start.c:291
#11 0x0000000000401ea9 in ?? ()

The upside is that at least I can install the usual debugging tools now and try to dig deeper.

Turns out nvidia ships a couple of libraries that may fiddle with TLS or at leat that’s what the name libnvidia-tls.so* suggests. There are 2 copies of the libraries (at least on Arch), one under /usr/lib, and another under /usr/lib/tls:

lrwxrwxrwx 1 root root    23 03-14 07:07 /usr/lib/libnvidia-tls.so -> libnvidia-tls.so.390.42
-rwxr-xr-x 1 root root 13080 03-14 07:07 /usr/lib/libnvidia-tls.so.390.42
lrwxrwxrwx 1 root root    23 03-14 07:07 /usr/lib/tls/libnvidia-tls.so -> libnvidia-tls.so.390.42
-rwxr-xr-x 1 root root 14480 03-14 07:07 /usr/lib/tls/libnvidia-tls.so.390.42

The libraries under tls have different checnksum than those one level up. Since the location should not matter for ld.so and we prepend the whole /var/lib/snapd/lib/gl path, I ignored those files. But, copying over the /usr/lib/tls magically fixed the problem, no more segfaults, glxinfo works, and so does ohmygiraffe.

I’ve opened a PR with updated snap-confine globs:
https://github.com/snapcore/snapd/pull/4901

It’d be great if someone on Ubuntu, Debian, Solus, Fedora or other distro could check if the PR does not break things for them.

2 Likes

if you can help me… on how to try it. i can be helpful than :slight_smile:

You can install snapd-git from AUR, it builds the latest master.

Another PR, this time make sure that we preserve the original layout of nvidia libs:
https://github.com/snapcore/snapd/pull/4902

Instead of mixed up libraries:

.:
total 0
lrwxrwxrwx 1 root maciek 23 Mar 22 08:41 libnvidia-tls.so -> libnvidia-tls.so.390.42
lrwxrwxrwx 1 root maciek 57 Mar 22 08:41 libnvidia-tls.so.390.42 -> /var/lib/snapd/hostfs/usr/lib/tls/libnvidia-tls.so.390.42

We should get the mirrored structure:

.:
total 0
lrwxrwxrwx 1 root maciek  23 Mar 22 09:43 libnvidia-tls.so -> libnvidia-tls.so.390.42
lrwxrwxrwx 1 root maciek  53 Mar 22 09:43 libnvidia-tls.so.390.42 -> /var/lib/snapd/hostfs/usr/lib/libnvidia-tls.so.390.42
drwxr-xr-x 2 root maciek  80 Mar 22 09:43 tls

./tls:
total 0
lrwxrwxrwx 1 root maciek 23 Mar 22 09:43 libnvidia-tls.so -> libnvidia-tls.so.390.42
lrwxrwxrwx 1 root maciek 57 Mar 22 09:43 libnvidia-tls.so.390.42 -> /var/lib/snapd/hostfs/usr/lib/tls/libnvidia-tls.so.390.42
1 Like

It’s resolved :slight_smile: Thanks, was facing this issue for a while

Edited: This fixed most of the application (vlc, ppsspp and games etc) :slight_smile:

5 Likes

On both Arch and Manjaro with Nvidia 380 or 390 series drivers snaps using hardware acceleration are crashing:
Console output

Spotify somewhat works but it takes long time to load and hardware acceleration is disabled.
Ohmygiraffe doesn’t work at all.

snap run --gdb doesn’t find stacktrace and coredumpctl doesn’t show anything really useful.

Note: Solus with the same driver versions works fine.

ping @mborzecki

Possibly related Wine bug about Nvidia drivers: https://bugs.winehq.org/show_bug.cgi?id=43530

@niemeyer can you merge this topic to Nvidia acceleration on chrome and firefox ?

@mborzecki Sure, that’s done.

Hello, using ubuntu 18.04, snap version 16-2.32+git622.ab40e67 and nvidia 390 driver.

I have issues with spotify and some other snaps, issue looks like this:

snap run spotify
failed to create prefix path: /tmp/snap.rootfs_zTBDGl/var/lib/snapd/lib/vulkan/icd.d: Permission denied

snap run flare-rpg
failed to create prefix path: /tmp/snap.rootfs_smuH38/var/lib/snapd/lib/vulkan/icd.d: Permission denied

But skype and atom works.

If i switch to intel card all snaps works. Not sure if i’m reporting my issue in correct thread, if no please let me know.

You should be able to update with snap refresh --beta core. See this topic for details: All the snaps stopped working.

I believe I’m experiencing the same problem in openSuse Tumbleweed: spotify fails to start because it fails to create the GL context.

[roman:~] % snap run spotify
/home/roman/Downloads was removed, reassigning DOWNLOAD to homedir
Gtk-Message: Failed to load module "canberra-gtk-module"
ATTENTION: default value of option force_s3tc_enable overridden by environment.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
[1]    14195 trace trap (core dumped)  snap run spotify
[roman:~] 133 % [0615/223908.654302:ERROR:gl_context_glx.cc(227)] Couldn't make context current with X drawable.
[0615/223908.654322:ERROR:gpu_info_collector.cc(62)] gl::GLContext::MakeCurrent() failed

I did snap refresh --beta core, but that didn’t help.

However, running it as following:

LD_LIBRARY_PATH=/snap/spotify/16/usr/lib/x86_64-linux-gnu /snap/spotify/16/usr/bin/spotify

… works.

Have you installed the drivers manually or used the TW package from nvidia?

I’ve used the packages.

[roman:~] % LANG=en_US zypper info nvidia-glG04
Information for package nvidia-glG04:
-------------------------------------
Repository     : NVIDIA                                         
Name           : nvidia-glG04                                   
Version        : 390.67-8.1                                     
Arch           : x86_64                                         
Vendor         : obs://build.suse.de/Proprietary:X11:Drivers    
Installed Size : 132.1 MiB                                      
Installed      : Yes                                            
Status         : up-to-date                                     
Source package : x11-video-nvidiaG04-390.67-8.1.nosrc           
Summary        : NVIDIA OpenGL libraries for OpenGL acceleration
Description    :                                                
    This package provides the NVIDIA OpenGL libraries to allow OpenGL
    acceleration under the closed-source NVIDIA drivers.

I have a spare drive. I’ll try to install TW there and see what happens.

Meanwhile, could you paste the output of rpm -ql nvidia-glG04?

Here’s the output:

[roman:~] % rpm -ql nvidia-glG04                    
/etc/vulkan
/etc/vulkan/icd.d
/etc/vulkan/icd.d/nvidia_icd.json
/usr/lib/libEGL_nvidia.so.0
/usr/lib/libEGL_nvidia.so.390.67
/usr/lib/libGLESv1_CM_nvidia.so.1
/usr/lib/libGLESv1_CM_nvidia.so.390.67
/usr/lib/libGLESv2_nvidia.so.2
/usr/lib/libGLESv2_nvidia.so.390.67
/usr/lib/libGLX_nvidia.so.0
/usr/lib/libGLX_nvidia.so.390.67
/usr/lib/libnvidia-eglcore.so.390.67
/usr/lib/libnvidia-glcore.so.390.67
/usr/lib/libnvidia-glsi.so.390.67
/usr/lib/libnvidia-ifr.so.1
/usr/lib/libnvidia-ifr.so.390.67
/usr/lib/libnvidia-tls.so.390.67
/usr/lib/tls
/usr/lib/tls/libnvidia-tls.so.390.67
/usr/lib64/libEGL_nvidia.so.0
/usr/lib64/libEGL_nvidia.so.390.67
/usr/lib64/libGLESv1_CM_nvidia.so.1
/usr/lib64/libGLESv1_CM_nvidia.so.390.67
/usr/lib64/libGLESv2_nvidia.so.2
/usr/lib64/libGLESv2_nvidia.so.390.67
/usr/lib64/libGLX_nvidia.so.0
/usr/lib64/libGLX_nvidia.so.390.67
/usr/lib64/libnvidia-egl-wayland.so.1
/usr/lib64/libnvidia-egl-wayland.so.1.0.2
/usr/lib64/libnvidia-eglcore.so.390.67
/usr/lib64/libnvidia-fbc.so.1
/usr/lib64/libnvidia-fbc.so.390.67
/usr/lib64/libnvidia-glcore.so.390.67
/usr/lib64/libnvidia-glsi.so.390.67
/usr/lib64/libnvidia-ifr.so.1
/usr/lib64/libnvidia-ifr.so.390.67
/usr/lib64/libnvidia-tls.so.390.67
/usr/lib64/tls
/usr/lib64/tls/libnvidia-tls.so.390.67
/usr/lib64/xorg/modules/extensions
/usr/lib64/xorg/modules/extensions/nvidia
/usr/lib64/xorg/modules/extensions/nvidia/nvidia-libglx.so
/usr/share/egl
/usr/share/egl/egl_external_platform.d
/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
/usr/share/glvnd
/usr/share/glvnd/egl_vendor.d
/usr/share/glvnd/egl_vendor.d/10_nvidia.json

Finally found some time to install TW. I’m using the latest package avaible from this repo https://build.opensuse.org/package/show/home:zyga:branches:system:snappy/snapd which is 2.33.1-13.1 at the moment. I’m using the same version of nvidia driver as you are. So far I have seen no issues, spotify (1.0.80.474.gef6b503e-7, rev 16), ohmygiraffe (1.1.0a, rev 3), my gl debugging snap all work fine.

If the problem persists, can you do:

$ snap install --edge graphics-debug-tools-bboozzoo
$ SNAPD_DEBUG=1 SNAP_CONFINE_DEBUG=1 snap run \
   graphics-debug-tools-bboozzoo.glxinfo

and post the log.

With your version of snap, it worked. Thanks.
My previous snap version came from here: https://docs.snapcraft.io/core/install-opensuse.