Rocm-validation-suite /dev/shm errors

Hi, I’m trying to get the rocm-validation-suite snap to work when confined. However, I see that there are some errors relating to multi-threading and shared memory.

I have already added the snapcraft-preload part and wrapper to my snapcraft.yaml file, but I suspect the ROCm SMI library creates these /dev/shm files in a way that is not captured by the preload library? The file that it is looking for is /dev/shm/rocm_smi_<name> (namely /dev/shm/rocm_smi_card1 in my system).

Any help, tips, suggestions, etc. would really be appreciated…

Application logs:

[RESULT] [ 17578.499597] Module name :iet
shm_open: Permission denied
[RESULT] [ 17580.28322 ] [action_5] [GPU::  7373] Power(W) 0.000000
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)

 hipStreamCreate() failed !!!

Error in rocblas_dgemm() !!!

journalctl and snappy-debug logs:

= Seccomp =
Time: Feb 12 13:57:18
Log: auid=1000 uid=1000 gid=1000 ses=3 subj=snap.rocm-validation-suite.rocm-validation-suite pid=136690 comm="rvs" exe="/snap/rocm-validation-suite/15/opt/rocm-6.3.2/bin/rvs" sig=0 arch=c000003e 203(sched_setaffinity) compat=0 ip=0x799689374576 code=0x50000
Syscall: sched_setaffinity
Suggestion:
* ignore the denial if the program otherwise works correctly (unconditional sched_setaffinity is often just noise)

= AppArmor =
Time: Feb 12 13:57:21
Log: apparmor="DENIED" operation="open" class="file" profile="snap.rocm-validation-suite.rocm-validation-suite" name="/dev/shm/rocm_smi_card1" pid=136690 comm="rvs" requested_mask="wr" denied_mask="wr" fsuid=1000 ouid=1000
File: /dev/shm/rocm_smi_card1 (write)
Suggestions:
* adjust program to create files and directories in /dev/shm/snap.$SNAP_NAME.*
* try the snapcraft preload plugin: https://github.com/sergiusens/snapcraft-preload

= Seccomp =
Time: Feb 12 13:57:21
Log: auid=1000 uid=1000 gid=1000 ses=3 subj=snap.rocm-validation-suite.rocm-validation-suite pid=136690 comm="rvs" exe="/snap/rocm-validation-suite/15/opt/rocm-6.3.2/bin/rvs" sig=0 arch=c000003e 203(sched_setaffinity) compat=0 ip=0x799689374576 code=0x50000
Syscall: sched_setaffinity
Suggestion:
* ignore the denial if the program otherwise works correctly (unconditional sched_setaffinity is often just noise)

I wonder if the review-tools need to be updated to suggest taking a look at:

EDIT: @alexmurray do you think that would make sense ?

1 Like

If I’m understanding this interface correctly, it would allow me to define a name for shared memory, which may work in this case (where I just have the one graphics card), but I think in a multi-GPU setup there may be multiple /dev/shm/rocm_smi_* files.

I wonder if my problem is related to this issue on snapcraft-preload?

Well,l my idea was more towards the private option of that interface…

private (plug): when true, creates a directory that is 
only accessible to the snap. This directory has 
read/write permissions, is mounted over /dev/shm, 
and permits an auto-connection to the 
system:shared-memory slot.

That way your app should be able to create anything it wants in its private /dev/shm without hitting any apparmor blocks.

Ohh, I definitely misunderstood it then, that would make sense

Interesting… That seemed to resolve part of the problem.

By adding

plugs:
    shared-memory:
        private: true

to my snapcraft.yaml, ROCm Validation Suite no longer complains about the /dev/shm file and the program is able to query and display the power utilization.

However, I still see that it fails to create threads…

[RESULT] [  2091.552934] [action_5] [GPU:: 33281] Power(W) 7.000000
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)
pthread_create failed 0 (Success)

 hipStreamCreate() failed !!!

Error in rocblas_dgemm() !!!

The only error I see in sudo journalctl --output=short --follow --all | sudo snappy-debug is the following:

= Seccomp =
Time: Feb 13 09:29:03
Log: auid=1000 uid=1000 gid=1000 ses=4 subj=snap.rocm-validation-suite.rocm-validation-suite pid=58134 comm="rvs" exe="/snap/rocm-validation-suite/x1/opt/rocm-6.3.2/bin/rvs" sig=0 arch=c000003e 203(sched_setaffinity) compat=0 ip=0x78c5a547c576 code=0x50000
Syscall: sched_setaffinity
Suggestion:
* ignore the denial if the program otherwise works correctly (unconditional sched_setaffinity is often just noise)

I double-checked, and I do have command: bin/snapcraft-preload ... for the app.

Well, do you actually still need snapcraft-preload at all ?

Hm, I suppose not, the library linter actually warns that the libraries are not being used.

Hm, I think I see why it might be running into problems… When I install and run rocm-validation-suite with devmode, I see a whole lot of AppArmor allows for a lot of Kernel Fusion Driver files.

I already made a PR in snapd (because I noticed them when trying to snap rocminfo). It is very likely that the errors I’m seeing are from the same underlying library failing to query some of these files.

1 Like