I know snap-confine does some fancy magic to make the nvidia driver available to snaps when running on classic Ubuntu. However, what if I want to use CUDA for image processing on a robot using Ubuntu Core? How might I go about that?
I think CUDA requires some extra libraries and probably wonât work out of the box. Do you have any snaps that can exercise CUDA hardware that could be used to prototype this?
I unfortunately donât have any hardware capable of CUDA, so no :â( .
I made some preparations so if I can get any nvidia hardware (something that uses recent drivers is best) I can give it a try. I now have an Intel motherboard with some ram and disk on my desk, that I can install various systems on.
It was brought to my attention that AWS provides instances with access to CUDA-capable GPUs. Perhaps that would be a good testing ground for this? Google cloud, as well.
I doubt that because 1) random kernel 2) extremely expensive compared to 50-100 card that I can use forever. I think we are at a stage where we need to tinker and experiment more. Those AWS instances only give you CUDA is you use their specific kernel.
Ah, that would indeed be problematic. Well, I look forward to hearing about your tinkering, then! Thanks for looking into this .
This is still required. Any progress on this front?
We have a few blender snaps around. Blender can utilize CUDA, might provide a good test snap for enablement. blender-tpaw is strictly confined, and I canât access CUDA. blender is classically confined, and I can access CUDA (yes, in the past year I got some CUDA hardware).
Hmmm, there was some CUDA work done lately (Problem with confined nvenc / cuda ffmpeg snap, https://github.com/snapcore/snapd/pull/5189). Did you test with 2.33 or 2.34? What specifically do you mean by âcanât access CUDAâ?
@jdstrand youâre magic! blender-tpaw actually does seem to work with CUDA on 2.33 (refreshed to beta core snap). I didnât see that other thread, good catch.
Also, âworksâ being defined as âshows up as an option hereâ:
Hi @jdstrand. I refreshed the snap core as @kyrofa suggested me in this thread but I couldnât get CUDA running when using devmode
confinement.
Running the same program outside snap or in a classic
confinement I get this:
OpenCV version: 2.4.10.1
CUDA runtime version: 6050
CUDA driver API version: 6050
CUDA devices: 1
However, when in devmode
this is the output:
OpenCV version: 2.4.10.1
CUDA runtime version: 0
CUDA driver API version: 0
CUDA devices: -1
OpenCV Error: Gpu API call (CUDA driver version is insufficient for CUDA runtime version) in getDevice, file /hdd/buildbot/slave_jetson_tk1_2/52-O4T-L4T/opencv/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 664
terminate called after throwing an instance of 'cv::Exception'
what(): /hdd/buildbot/slave_jetson_tk1_2/52-O4T-L4T/opencv/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:664: error: (-217) CUDA driver version is insufficient for CUDA runtime version in function getDevice
Iâm not sure if it is necessary but I also included (and connected) some plugs to interfaces like opengl
, hardware-observe
, system-observe
⌠as well as other not related with this. Am I missing to include something to enable CUDA support?
This looks like there is a mismatch of some sort between your system and the snap. Iâve not worked with CUDA much myself, but understand that the libraries and the kernel driver must be in sync. IIRC, @mborzecki looked at this(?) once before, perhaps he has some additional information.
Which version of CUDA runtime do you use? Is it included in the snap?
Hi @mborzecki, thanks for pointing out that. It seems that the CUDA packages werenât being included correctly. Now that I solved that, at least the API version is shown correctly, however, seems thereâs still something missing at runtime.
OpenCV version: 2.4.10.1
CUDA runtime version: 0
CUDA driver API version: 6050
OpenCV Error: Gpu API call (unknown error) in getCudaEnabledDeviceCount, file /hdd/buildbot/slave_jetson_tk1_2/52-O4T-L4T/opencv/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp, line 652
terminate called after throwing an instance of 'cv::Exception'
what(): /hdd/buildbot/slave_jetson_tk1_2/52-O4T-L4T/opencv/modules/dynamicuda/include/opencv2/dynamicuda/dynamicuda.hpp:652: error: (-217) unknown error in function getCudaEnabledDeviceCount
Iâm setting the environment as follows:
environment:
LD_LIBRARY_PATH: $SNAP/usr/local/cuda-6.5/lib:$SNAP/usr/lib/arm-linux-gnueabihf/:$LD_LIBRARY_PATH
PATH: $SNAP/usr/local/cuda-6.5/bin:$PATH
Iâve been taking a closer look to the opengl interface and I think might have found the problem.
In order to allow access to nvidia devices, this plug opens in r/w mode the /dev/nvidia*
descriptors. Thatâs right for desktop systems, however, I am running the snap from a TK1 board. In this board, those devices donât exist, they are actually mapped as /dev/nvhost*
and /dev/nvmap
. Could you add this to the interface?
Can you paste the log of AppArmor denials? Does installing the snap in devmode allow it to run?
It doesnât seem so, I already installed it in devmode
.
Regarding the AppArmor, Iâd tried this as I used to do but I got nothing:
sudo snap install snappy-debug sudo snap connect snappy-debug:log-observe /snap/bin/snappy-debug.security scanlog
Itâs been a while I havenât done this kind of debugging so Iâm probably missing something⌠Iâll try to give you an error log next Monday!
Hi @mborzecki, I detected why I wasnât getting anything from snappy-debug
, neither with sudo grep audit /var/log/syslog
. As I pointed in the other post, Iâm running a L4T with a customized kernel, and it seems that that kernel doesnât include the AppArmor module by default. This is what I was getting:
Jun 18 11:28:49 UBUNTU-TK1 snapd[530]: AppArmor status: apparmor not enabled
I recompiled the kernel adding the AppArmor module options. The situation improved but no as much as I expected:
Jun 18 14:22:19 UBUNTU-TK1 snapd[478]: AppArmor status: apparmor is enabled but some features are missing: caps, dbus, mount, namespaces, network, ptrace, signal
I tried to get something useful directly checking /var/log/syslog
, and although now is a little bit more verbose, I couldnât get anything but information at the system start-up (it doesnât show more information regardless how many times I launch the different snaps).
Jun 18 14:22:31 UBUNTU-TK1 kernel: type=1400 audit(1529299351.256:110): apparmor="STATUS" operation="profile_replace" name="snap-update-ns.core" pid=1348 comm="apparmor_parser"
Jun 18 14:22:31 UBUNTU-TK1 kernel: type=1400 audit(1529299351.261:112): apparmor="STATUS" operation="profile_replace" name="snap-update-ns.hello-world" pid=1350 comm="apparmor_parser"
Jun 18 14:22:31 UBUNTU-TK1 kernel: type=1400 audit(1529299351.262:113): apparmor="STATUS" operation="profile_replace" name="snap-update-ns.snapcraft" pid=1351 comm="apparmor_parser"
I did a fast search about the âfeatures missingâ and it doesnât allow me to be very optimistic as the most of the solutions involve to upgrade the kernel, thing that I cannot do as I would lose some devices support.
Regarding the original issue, I found out that when launching the standalone version and later launching the snap, the CUDA runtime API is correctly detected from the snap. On the other hand, if I launch first the snap and then the standalone version, none of them will detect the runtime API correctly.
I donât know if this will give you any hint, just wanted to let you know as I havenât realised this behaviour until now.
I wouldnât bother enabling AppArmor if itâs not there.
Can you try and run the application under strace? snap run --strace <snap>.<app>
.
Another thing that might be worth checking is whether the once the standalone instance run any new /dev nodes appear or drivers are loaded.