Classic confinement request: gpu-burn

pedro-avalos · November 1, 2024, 9:20am

Hi,

I’m snapping an open source GPU stress testing program (GPU Burn) that we use for the Certification tests in Checkbox, and I have a question about classic confinement. The program uses nvidia-smi to monitor the temps of the GPUs, and this program comes from the NVIDIA drivers metapackage in the Ubuntu repositories rather than its own package. I don’t think the snap should package all of the drivers, as it should be able to access them through the opengl interface.

Would this be a fair case for classic confinement? Or do you know of a different way that I could give the snap access to that executable?

For reference:

The snap code is at pedro-avalos/gpu-burn-snap
The upstream source is at wilicc/gpu-burn

pedro-avalos · November 1, 2024, 11:12am

For more context, I am currently trying to run the snap and AppArmor denies accesses to NVIDIA-related devices, specifically these on my system:

/dev/char/507:0 -> ../nvidia-uvm
/dev/char/507:1 -> ../nvidia-uvm-tools
/dev/char/195:0 -> ../nvidia0

I have added the opengl plug and GLib dependencies as documented in Adding OpenGL/GPU support to a snap | Snapcraft documentation. Still, the package does not run properly.

James-Carroll · November 1, 2024, 11:58pm

Fortunately you don’t have a choice on that point, NVidia drivers are intrinsically linked to the kernel in use, you can’t replace the userspace without replacing kernel space and so you can’t provide a generic NVidia platform in the same capacity as what happens already for Mesa.

snapd brings in some of the libraries directly from the host. Flatpak tends to ship the userspace in its entirety as a separate SDK. Both have problems relating to updates of the host driver to some degree, as even if Flatpak does actually ship the drivers, it has the same issue of those userspace bits need to be kept in sync with the kernel bits, there’s codepaths dedicated in both platforms just for NVidia and both platforms break from time to time on newer drivers.

With that in mind, no, you can’t bundle the NVidia driver, it won’t work. However, you maybe can bundle NVidia-SMI (subject to licensing I guess), and rely on this integrating with the host.

Unfortunately, I’m not knowledgable on NVidia and graphics here to say much more on that specifically, but I would advise you try sudo snap install snappy-debug and then run snappy-debug in your CLI. In a second CLI/window, open your snap, and check if the debug tool can give you any pointers.

The debug tool works by looking for sandbox errors and trying to provide hints to solve them. Do keep in mind, not every problem needs solving to get a fully functional app, a lot of errors are noise, but hopefully it should give you pointers into which subsystem might be interfering and how to correct that.

And, it might be worth adding extensions: [gnome] to your app definition, as, whilst this is primarily intended for GUI applications and I’m thinking yours might be a CLI app, it does a lot of GPU magic as you’d expect from GUI apps and works fine for CLI apps too. You’ll have a much more “normal” looking environment in your snap with it than without.

This replaces your snapd-glib bit in entirety should you choose to use it, and to be honest, I’d highly recommend it even if your app is CLI, because it’s still GPU orientated and the extensions are the intended go to way to set up GPU stuff for most use cases.

Edit: Regarding classic, you’d likely find it solves your problems instantly. I don’t think it’s impossible that you could get it, but I do also expect that it can possibly be done in strict confinement too; and we encourage people to pursue that first even when it’s difficult because the technical quality is often better, you’ll be asked to justify that strict can’t do it, which, we can prove if it can or can’t by pushing a bit further.

pedro-avalos · November 2, 2024, 7:51am

Hi James, thanks for your quick reply! I have been using snappy-debug, and it gave me some noise, but it did help me find the files I referenced in this post. I think I’ll give the gnome extension a try, but (after chatting a little with someone from the snapd team) I think there may also be some issues from the snapd side of things and created a bug on launchpad.

pedro-avalos · November 6, 2024, 6:17pm

I think this actually was an issue on my end. The program was being compiled for a specific CUDA compute architecture and being run on a different architecture. I’ve patched the code in the snap, and it seems to work just fine after connecting opengl and hardware-observe.

I’ve also bundled in the binary blob for nvidia-smi, which should be fine to do.

Therefore, I think classic confinement is not needed.