Hi, I’m trying to get the rocm-validation-suite snap to work when confined. However, I see that there are some errors relating to multi-threading and shared memory.
I have already added the snapcraft-preload part and wrapper to my snapcraft.yaml file, but I suspect the ROCm SMI library creates these /dev/shm files in a way that is not captured by the preload library? The file that it is looking for is /dev/shm/rocm_smi_<name> (namely /dev/shm/rocm_smi_card1 in my system).
Any help, tips, suggestions, etc. would really be appreciated…
If I’m understanding this interface correctly, it would allow me to define a name for shared memory, which may work in this case (where I just have the one graphics card), but I think in a multi-GPU setup there may be multiple /dev/shm/rocm_smi_* files.
I wonder if my problem is related to this issue on snapcraft-preload?
Well,l my idea was more towards the private option of that interface…
private (plug): when true, creates a directory that is
only accessible to the snap. This directory has
read/write permissions, is mounted over /dev/shm,
and permits an auto-connection to the
system:shared-memory slot.
That way your app should be able to create anything it wants in its private /dev/shm without hitting any apparmor blocks.
Interesting… That seemed to resolve part of the problem.
By adding
plugs:
shared-memory:
private: true
to my snapcraft.yaml, ROCm Validation Suite no longer complains about the /dev/shm file and the program is able to query and display the power utilization.
However, I still see that it fails to create threads…
The only error I see in sudo journalctl --output=short --follow --all | sudo snappy-debug is the following:
= Seccomp =
Time: Feb 13 09:29:03
Log: auid=1000 uid=1000 gid=1000 ses=4 subj=snap.rocm-validation-suite.rocm-validation-suite pid=58134 comm="rvs" exe="/snap/rocm-validation-suite/x1/opt/rocm-6.3.2/bin/rvs" sig=0 arch=c000003e 203(sched_setaffinity) compat=0 ip=0x78c5a547c576 code=0x50000
Syscall: sched_setaffinity
Suggestion:
* ignore the denial if the program otherwise works correctly (unconditional sched_setaffinity is often just noise)
I double-checked, and I do have command: bin/snapcraft-preload ... for the app.
Hm, I think I see why it might be running into problems… When I install and run rocm-validation-suite with devmode, I see a whole lot of AppArmor allows for a lot of Kernel Fusion Driver files.
I already made a PR in snapd (because I noticed them when trying to snap rocminfo). It is very likely that the errors I’m seeing are from the same underlying library failing to query some of these files.