Nvidia GL libs access broken on Ubuntu 18.04

Yes, you are right. We would need runtime detection to properly support re-exec.

No because there’s just one version, the one in the core snap.

There are two snap-confine that must be considered with re-exec, the one in the snap and the one in the deb. The snap will always have the 16.04 non-glvnd compile build options, but what I was saying is that 18.04 could have different build options for glvnd, but corrected myself since that wouldn’t work with reexec since the snap has the 16.04 non-glvnd build options. Even if we only ever used the snap-confine from the core snap though, it is going to have to handle systems with and without glvnd, so I was agreeing with you and @morphis that the static build options may need to go away… Am I missing something?

1 Like

No this is exactly right.

I think we will make snap-confine capable to do both and just give it hints as to which system to use at runtime. One of my desires would be to migrate the opengl support to the current layout/content/mount code so that snapd would get to decide.

Since I’m doing all the nvidia fixes atm I’ve opened a PR with a RFC:
https://github.com/snapcore/snapd/pull/4908
The idea is to autodetect if the host is using /usr/lib/<arch-triplet> and try to do the right thing.

FWIW ohmygiraffe works now.

I’ve updated the PR with more elaborate detection. Nvidia drivers package in xenial (actually all versions before bionic) have both /usr/lib/<arch-triplet>, but the libraries are under /usr/lib/nvidia. Then in bionic, all libraries are under /usr/lib/<arch-triplet>.

The way things work now:

  • during configure time, pass --with-host-arch-triplet=x86_64-gnu-linux (optionally --with-host-arch-32bit-triplet=i386-gnu-linux) - this is picked up automatically in debian/rules and autogen.sh
  • use a well known nvidia library as a canary, in this case it’s libnvidia-glcore.so.<driver-major>.<driver-minor>, eg. libnvidia-glcore.so.390.42 in my system, 390.42 is the driver package version
  • at runtime
    • grab the current driver version by poking /sys (we had this code before)
    • check if /usr/lib/<arch-triplet>/libnvidia-glcore.so.%d.%d exists, if so symlink all the files like we do in biarch case
      • additionally check if /usr/lib/<arch-32bit-triplet>/libnvidia-glcore... exists, if so also symlink it at the proper location
    • when symlinking prefix path is preserved, eg.
      /usr/lib/<arch-triplet>/libnvidia-tls.so..  -> /var/lib/snapd/lib/gl/<arch-triplet>/libnvidia-tls.so..`
      /usr/lib/<arch-triplet>/tls/libnvidia-tls.so..  -> /var/lib/snapd/lib/gl/<arch-triplet>/tls/libnvidia-tls.so..
      
    • if the canary lib does not exist, falls back to bind mounting /usr/lib/nvidia-<driver-major> as we did before
  • apparmor profile was updated to allow snap-confine do create the prefix path as needed
  • opengl interface was updated to allow access to /var/lib/snapd/lib/gl/<arch-triplet>/tlslibnvidia-tls.so.., as this was causing unexplained segfaults that were debugged in the other nvidia thread already

Ohmygiraffe works with nvidia and confinement now.

I’ve tested @mborzecki’s PR on my 16.04 laptop, running three games (ohmygiraffe, minecraft, and supertuxkart), and it works fine with both the nvidia card and the intel card as long as I discard the mount namespaces of opengl-using things when I switch PRIME.

In other words, works as expected, it’s either not a regression or an outright improvement (I hadn’t realised I needed to discard namespaces when switching, so haven’t tested that without this patch; before, it looked like things just didn’t work with intel).

The glxinfo in graphics-debug-tools-bboozzoo is still not finding the intel driver, though, which might be a bug in it (or in what we’re doing), but it’s still not a regression.

What are you stating isn’t a regression? You tested on an unaffected system because the problem occurs on 18.04. Therefore your statement that “it works fine” is incorrect with regards to the bug being discussed here.

Hmm, i guess there is no Vulkan Support here…
Arch Linux

➜  ~ graphics-debug-tools-bboozzoo.vulkaninfo
===========
VULKAN INFO
===========

Vulkan API Version: 1.0.61

Cannot create Vulkan instance.
/build/vulkan-YO5iw0/vulkan-1.0.61.1+dfsg1/demos/vulkaninfo.c:704: failed with VK_ERROR_INCOMPATIBLE_DRIVER
1 Like

Doesn’t he mean the patch for 18.04 doesn’t cause a regression in 16.04?

1 Like

Maybe. English is only my first language, so I’m likely to have misread :-p Words I do good readie writer :smiley:

2 Likes

I’m aware. FWIW, it’s the same on Arch too, but I haven’t tried to debug it further yet.

The proposed fix for 18.04 does not cause a regression on 16.04.
Chill.

3 Likes

Hello, using ubuntu 18.04, snap version 16-2.32+git622.ab40e67 and nvidia 390 driver.

I have issues with spotify and some other snaps, issue looks like this:

snap run spotify
failed to create prefix path: /tmp/snap.rootfs_zTBDGl/var/lib/snapd/lib/vulkan/icd.d: Permission denied

snap run flare-rpg
failed to create prefix path: /tmp/snap.rootfs_smuH38/var/lib/snapd/lib/vulkan/icd.d: Permission denied

But skype and atom works.

If i switch to intel card all snaps works.

2.32.2 with the fixes is in beta now, can you try snap refresh --beta core and check if opengl works for you?

2 Likes

Works for me with my ubuntu-18.04 with nvidia-340 driver. Thanks for the fix!

1 Like

@mborzecki It would be great if we can get Vulkan support in near future?

I’m afraid vulkan support is a subject to further research. I’ve played a bit with it and got to a state where vulkaninfo properly reports Nvidia ICD. The problem was that ICD discovery is somewhat awkward and while you can pass a path to specific icd in VK_ICD_FILENAMES, I have not found a way to override the ICD search paths. I hope that we will be able to sort it out with proper layout adjustments.

1 Like

Hi @mborzecki . Sorry to dig up an old thread.

I’ve had a bug report about a snap I’ve made which fails to run under 18.04, after an update from 17.10. I can’t reproduce the issue but the snap is complaining about gl libraries. I was wondering if it is significant that the reporter is running a laptop nvidia card? Would hybrid graphics/optimus be supported with your fix? Do you have any suggestions about what I could do to debug this?

@mcphail left a comment asking for more logs

IIRC @chipaca ran some snaps on a system with integrated nvidia card, albeit this might have been a 16.04 system at the time. As far as the snaps are concerned, the current setup will first probe if the NVIDIA driver is loaded and then verify if matching NVIDIA libraries are available in /usr/lib/<arch-tuple>/. That’s all there is to it.
If I correctly understand what PRIME does, you should be able to select your nvidia card as the sink and then set DRI_PRIME=<idx> to use the card. Given how messy (and useless) it has been for Linux users, I’d recommend to first get it working outside of snaps.

1 Like