I’m having some issues with deploying GPU accelerated programs for GPGPU and/or computer vision applications via snap containers. There are a few related topics on this forum, but I think my questions are different enough to merit a new thread instead of hijacking an existing one.
I have the following repository: https://github.com/theseankelly/jetson-cuda-snap which started as a fork of @abeato 's work to enable X11-based programs here: https://github.com/alfonsosanchezbeato/jetson-nano-x11. Starting from his repository I have made the following changes:
- dropped X11 support as I don’t need to render anything
- upgraded the L4T packages from 32.2 to 32.3.1 (JetPack 4.3) using the TX2 debians, not nano
- Added CUDA 10.0 from JetPack 4.3 targeting the TX2
- Added a few sample CUDA programs:
- query.cu – queries for information about the underlying GPU device
- add.cu – simple application to add a couple vectors together. Pulled from the tutorial listed at the top of the source file
- saxpy.cu – similar application, except it uses explicit memory allocations and copies instead of managed memory
I’ve managed to compile the CUDA applications via part of my snapcraft recipe. I’m installing the snap on my TX2 (have tried both devmode and strict) running Linux for Tegra, where
snap --version lists:
uskellse@uskellse-tx2:~$ snap --version snap 2.45.1 snapd 2.45.1 series 16 ubuntu 18.04 kernel 4.9.140-tegra
When running my applications, I get mixed results. It looks like the
query-gpu program can actually communicate with the underlying driver/GPU:
uskellse@uskellse-tx2:~$ jetson-cuda-snap.query-gpu cudaGetDeviceCount returned: 0 Device Number: 0 Device name: NVIDIA Tegra X2 Memory Clock Rate (KHz): 1300000 Memory Bus Width (bits): 128 Peak Memory Bandwidth (GB/s): 41.600000
saxpy programs do run (and the CUDA APIs do not return errors), but the underlying data is not properly added and transferred to the results buffer, suggesting the code isn’t actually executing on the GPU, or it’s not crossing memory boundaries, or something. Not sure.
saxpy behave as expected if I build/execute them natively from Linux for Tegra.
What’s the current state of CUDA on Tegra platforms within snaps? I’m far from a CUDA expert so very possible I’m doing something fundamentally wrong here.
NOTE: I also tried a similar experiment with my Jetson Nano which is running Ubuntu Core 18 (per @abeato’s tutorials again, and of course using the binaries from the nano version of L4T 32.3.1 instead of the TX2). I didn’t make it quite as far – the GPU fails to query and the
saxpy programs segfault. Didn’t take it further since the TX2 running L4T is more interesting to me anyhow…