None of my snaps can find my GPU after upgrading to the Nvidia 545 drivers

After upgrading from nvidia-driver-535 to nvidia-driver-545, suddenly none of my snaps seem to be able to use my GPU anymore. Firefox reports

[GFX1-]: glxtest: libEGL initialize failed

on startup, and seems to be falling back to software rendering, and SuperTuxKart seems to just crash. When I run non-snap versions of these apps, they seem to work with the GPU just fine.

I’m on Ubuntu 23.10.

I also have nvidia, and I updated to 545 today, but there was no such issue

Have you rebooted since updating the driver?

Yeah, I’ve rebooted a couple times. Doesn’t seem to have made a difference.

Looks similar to this long standing issue with firefox snap GPU acceleration on wayland: Snaps don't detect NVIDIA driver in Wayland.

That’s interesting, because I’m fairly certain I’m not using Wayland. Could be a similar root cause I guess, though.

There is a similar firefox report on https://bugzilla.mozilla.org/show_bug.cgi?id=1876614 which leads this apparmor denial

apparmor=“DENIED” operation=“symlink” class=“file” profile=“snap.firefox.firefox” name=“/dev/char/195:255” pid=9680 comm=“glxtest” requested_mask=“c” denied_mask=“c” fsuid=1000 ouid=1000

The reporter stated that going back to 535 fixes the issue

Let me repeat what @popey said in a previous post : did you reboot afterwards ?

Because I did experience a “hardware undetected” problem too (under different, repeating circumstances though), that would be resolved by rebooting, except I didn’t want to reboot each and every time it happened. I found it was related to the concerned snap’s namespace being outdated regarding current hardware configuration, and removing the namespace (which is recreated accordingly upon next snap app execution) solved the problem.

More precisely, you would have to :

  1. Quit the affected snaps
  2. Identify affected snap namespaces that are mounted : mount | grep nsfs
  3. Unmount affected snap namespaces (in the case of Firefox : umount /run/snapd/ns/firefox.mnt)

If the problem is namespace-related, that should do it (but again, as rebooting would). If that’s not the problem, it may very well be AppArmor, as @seb128 mentioned.

I tried rebooting again just to be safe, and there is no change in behavior.

I checked my syslog after starting Firefox, and I seem to have apparmor messages consistent with that bugzilla thread that @seb128 posted:

kernel: [  189.669164] audit: type=1400 audit(1706672274.082:157): apparmor="DENIED" operation="symlink" class="file" profile="snap.firefox.firefox" name="/dev/char/195:255" pid=7911 comm="glxtest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
kernel: [  189.669702] audit: type=1400 audit(1706672274.082:158): apparmor="DENIED" operation="symlink" class="file" profile="snap.firefox.firefox" name="/dev/char/195:0" pid=7911 comm="glxtest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
kernel: [  189.669773] audit: type=1400 audit(1706672274.082:159): apparmor="DENIED" operation="symlink" class="file" profile="snap.firefox.firefox" name="/dev/char/195:0" pid=7911 comm="glxtest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
kernel: [  189.669879] audit: type=1400 audit(1706672274.082:160): apparmor="DENIED" operation="symlink" class="file" profile="snap.firefox.firefox" name="/dev/char/195:0" pid=7911 comm="glxtest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
kernel: [  189.669978] audit: type=1400 audit(1706672274.082:161): apparmor="DENIED" operation="symlink" class="file" profile="snap.firefox.firefox" name="/dev/char/195:0" pid=7911 comm="glxtest" requested_mask="c" denied_mask="c" fsuid=1000 ouid=1000
firefox_firefox.desktop[7811]: [GFX1-]: glxtest: libEGL initialize failed

The other snaps I’ve tried that seem to have GPU issues manifest differently, however. SuperTuxKart has no obviously related apparmor messages, but fails with this error in the stdout:

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error:  GLXBadContext
  Request Major code 152 (GLX)
  Request Minor code 6 ()
  Error Serial #117
  Current Serial #116

I’ve also noticed that the firmware-updater snap appears to be failing in a similar way to SuperTuxKart. That causes this audit message to be logged:

audit: type=1400 audit(1706672570.176:175): apparmor="DENIED" operation="open" class="file" profile="snap.firmware-updater.firmware-updater-app" name="/proc/sys/vm/max_map_count" pid=10254 comm="firmware-update" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0

But the more relevant errors that I see in the syslog would probably be these:

firmware-update[10254]: The program 'firmware-updater' received an X Window System error.#012This probably reflects a bug in the program.#012The error was 'BadValue'.#012  (Details: serial 581 error_code 2 request_code 152 (GLX) minor_code 22)#012  (Note to programmers: normally, X errors are reported asynchronously;#012   that is, you will receive the error a while after causing it.#012   To debug your program, run it with the GDK_SYNCHRONIZE environment#012   variable to change this behavior. You can then get a meaningful#012   backtrace from your debugger if you break on the gdk_x_error() function.)
kernel: [  485.847667] traps: firmware-update[10254] trap int3 ip:7f3318e2658f sp:7fff74b88880 error:0 in libglib-2.0.so.0.7800.0[7f3318dd6000+ae000]
systemd[2972]: snap.firmware-updater.firmware-updater-app.service: Main process exited, code=dumped, status=5/TRAP
systemd[2972]: snap.firmware-updater.firmware-updater-app.service: Failed with result 'core-dump'.
systemd[2972]: snap.firmware-updater.firmware-updater-app.service: Scheduled restart job, restart counter is at 1.

Since this all seems to have started for me when I installed the Nvidia 545 drivers, I assume that downgrading to 535 would work, but I don’t really want to reboot again at the moment, so I’ll just make do with software rendered Firefox for now. I might check that later tnight if there’s a good opportunity.

I can confirm that downgrading to nvidia-driver-535 fixes all 3 mentioned snaps. This would definitely seem to suggest that there’s some bad interaction between snaps and the Nvidia 545 driver, but I don’t know enough about the internals of either to point to exactly what the problem is.

I have the right hardware and know the internals enough to look. I will post something soon.

looking at the 545 release of the package, it seems @xnox changed the handling of install locations in course of updating to a new debhelper version.

my guess would be that some bits are not in places where snapd (or the opengl interface) expects to find them anymore due to these changes …

in theory it should all just still work. 545 is a new feature branch, but it claims to be backwards compatible with 535, 525, 515…

Note that as it is marked as a new feature branch, usually it should not be offered as default one to install but rather 535 should still be offered by default.

Thank you for testing this, and I am sorry you are experiencing integration issues.

Just in case, mind the Nvidia 550 beta drivers just published last week :slight_smile:

Is there a launchpad issue I could link to on Bugzilla?

Indeed, Software & Updates, in the Additional Drivers tab, shows 535 still being the tested, recommended one.

One advantage of not doing directly from apt in that specific case. Moreover, it checks that the driver is still compatible with installed hardware, through the Modaliases entry. Of course the latter isn’t relevant here, since it still works outside of Snap confinement. Imho, some AppArmor profile needs a change.

@alissy I found Bug #2051298 “GPU not available in snap” : Bugs : nvidia-graphics-drivers-545 package : Ubuntu on launchpad, which seems to be basically the same issue I reported in this thread, but it doesn’t seem to have gotten any attention yet.

I decided to give the 545 drivers another try after installing the most recent snapd update (now at 2.61.3+23.10), and it seems like everything is working properly now.