djlbench is a benchmark tool to benchmark machinelearning models for various frameworks.
bjlbench itself is a java program. However each deeplearning framework is native libraries as shared library packaged in the .jar file.
At runtime, those shared library will be unpacked and stored into cache directory. By default the cache directory is in user’s home directory. This is blocked by restrict confinement.
Even I can store those files into /tmp directory, I’m still not able to load those shared libraries with:
Access to the user’s home directory is available when plugging the home interface - as such I don’t see why classic confinement should be needed in this case.
I have home plugin defined in snapcraft.yaml file:
I still get I AccessDeniedException in the code when I trying to create folder under home directory. If I change to use /tmp, I’m able to write the file.
Based on the document, home plugin allows access to non-hidden files, the library this APP depends on is trying to create ~/.djl.ai folder, which is hidden.
But even I can write to home, I’m still not able to call System.load(), I believe there is security manager that preventing this operation.
So to access a hidden folder in home you also then need to use the personal-files interface - so you could define a personal-filesplug called dot-djl-ai with write permission to $HOME/.dji.ai and then after installing the snap and manually connecting this interface your app should be able to write to this folder.
You can then request auto-connection of this so that when the snap is installed it is automatically connected and thus you should not require classic confinement.
How should I resolve System.load() issue? On a GPU machine, the shared library is linked against CUDA driver, so it will try to load CUDA driver as well.
with personal-files plugin, can I have write access to nested folders under ~/.djl.ai folder? The app only know the folder structure under the folder at runtime
Right, so in this case, djlbench is the clear owner of the ~/.djl.ai folder so you will be able to request such access and get it approved as per The personal-files interface. Please update your snap declaration and let us know so we can proceed with the process.
Could you please share the denials you are getting when trying to load such libraries? You can use snappy-debug to help troubleshoot.
CUDA support requires two main bits the above snapcraft.yaml doesn’t have
You need to map the host driver into the snap, the gnome-extensions can handle this bit (even if it’s a CLI app that doesn’t spawn a GUI window at all).
You need to ship (at a minimum) cudart.so yourself in your snap, but there might be some extras to consider.
So it’s important then to understand the CUDA EULA, which says which bits you can and can’t distribute, luckily cudart.so is on the distributable list!
To satisfy 1), just add extensions: [gnome-3-34] under the app section, e.g
To satisfy 2, consider adding the CUDA repositories to the snapcraft.yaml and simply add cudart.so as a dependency. Taken from a core 20 example (for core 18 you’d need to adapt the URL and the key-id)
Keeping in mind that you likely only need cudart.so, you might be able to find a more specific package than cuda-libraries-major-minor.
Due to the nature of NVidia drivers and CUDA, you’d want to pick the lowest version of libcudart that your application can get away with, since it will indirectly set the minumum required NVidia driver version to run it.
I tried to add personal-files plugin, I still run into exception:
java.lang.UnsatisfiedLinkError: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so: failed to map segment from shared object
at java.lang.ClassLoader$NativeLibrary.load0(Native Method) ~[?:?]
at java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442) ~[?:?]
at java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498) ~[?:?]
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694) ~[?:?]
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2627) ~[?:?]
at java.lang.Runtime.load0(Runtime.java:768) ~[?:?]
at java.lang.System.load(System.java:1837) ~[?:?]
Here is the output from snappy-debug:
= AppArmor =
Time: Jun 29 20:43:24
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/mountinfo" pid=11648 comm="java" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/11648/mountinfo (read)
Suggestions:
* adjust program to not access '@{PROC}/@{pid}/mountinfo'
* add 'mount-observe' to 'plugs'
= AppArmor =
Time: Jun 29 20:43:24
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/coredump_filter" pid=11648 comm="java" requested_mask="wr" denied_mask="wr" fsuid=1000 ouid=1000
File: /proc/11648/coredump_filter (write)
Suggestion:
* adjust program to not access '@{PROC}/@{pid}/coredump_filter'
= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/mountinfo" pid=11648 comm="java" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/11648/mountinfo (read)
Suggestions:
* adjust program to not access '@{PROC}/@{pid}/mountinfo'
* add 'mount-observe' to 'plugs'
= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="file_mmap" profile="snap.djlbench.djlbench" name="/home/ubuntu/.cache/JNA/temp/jna15433788716430075344.tmp" pid=11648 comm="java" requested_mask="m" denied_mask="m" fsuid=1000 ouid=1000
File: /home/ubuntu/.cache/JNA/temp/jna15433788716430075344.tmp (mmap)
Suggestion:
* add 'personal-files (see https://forum.snapcraft.io/t/the-personal-files-interface for acceptance criteria)' to 'plugs'
= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="file_mmap" profile="snap.djlbench.djlbench" name="/home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so" pid=11648 comm="java" requested_mask="m" denied_mask="m" fsuid=1000 ouid=1000
File: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so (mmap)
Suggestion:
* add 'personal-files (see https://forum.snapcraft.io/t/the-personal-files-interface for acceptance criteria)' to 'plugs'
Strict snaps live in a mount namespace, they don’t usually see the hosts system libraries and it’s generally an antipattern to rely on them, so looking for cudart in the normal locations will certainly fail, whether it’s silent or not I wouldn’t know, that depends on your own code.
You could possibly use system-backup or system-files interfaces to try and access the host libraries explicitly, but by shipping cudart.so yourself, you end up not having to rely on the host libraries at all. Assuming the user already has NVidia drivers installed that support the cudart.so you ship, then they wouldn’t have to install CUDA system wide at all.
We won’t be able to bundle the CUDA library in our tool.
The CUDA and cudnn library are huge and our library jar is only a few mb
We support wide range of CUDA version, it doesn’t make sense to bundle all versions
CUDA is optional in our tool, if user only want to benchmark on CPU, they don’t need CUDA at all.
We don’t know which version of CUDA user want to run benchmark. We assume user installed version is the one. This way we don’t need prompt user to select a CUDA version.
I can agree with the size complaints, even with compression snap provides, CUDA does bloat downloads. On the other hand, it’s Canonical paying the bill ;). There were previous discussions about resolving this by the use of content snaps to help deduplication but as far as I’m aware this hasn’t gotten anywhere yet.
What stands out to me most there is CuDNN, as far as I know of NVidia’s current licensing (not a lawyer), CuDNN cannot be redistributed which ruins the ability to ship these libraries inside the snap. Legally, the user needs to acquire this themselves.
I would still suggest looking into the system-files/system-backup interfaces. Since these can expose the host system, you could potentially use the host CUDA libs in a strict snap. You’d need to add them in $LD_LIBRARY_PATH (and also keep in mind the gnome-extension, which will handle some other environment issues, but you could replace the gnome-extension with something smaller too if preferred).
So my hopefully helpful opinion to the reviewers here (keep in mind I’m not a reviewer) is that the requirement for CuDNN in particular is a pretty severe limitation in the strict model, and at a minimum to be functional for users who’d want that functionality, system-backup or system-files would be needed.
The more I look into this, the more I believe we have to use classic mode:
We are a wrapper on top of other DeepLearning frameworks: PyTorch, TensorFlow, MXNet …, it’s would very hard for us to understand how each framework access the system. So it very hard to list all the path of library the tool will need for different linux distribution.
We allows user to load custom shared library at runtime (custom operators for different hardware accelerator, like AWS Infererentia chip). We don’t really know where those library will be installed.
As a performance benchmark tool, we monitor CPU and memory utilization, not clear to me if that will hit any permission issue.
On further review, I think djlbench more closely fits within the debug tools category for classic confinement - ie. it is used by developers to benchmark their machine learning models which is pretty close to the activity of debugging. Also it does require access to arbitrary files / libraries already on the system. As such, I think it meets the criteria for classic confinement.
@advocacy can you please perform publisher vetting?
I agree with this analysis. We do want in the future to work to have ways for snaps to access transparently this kind of libraries from the host or from snaps, at which point the situation could be reconsidered.