Request for Classic confinement: djlbench

djlbench is a benchmark tool to benchmark machinelearning models for various frameworks.

bjlbench itself is a java program. However each deeplearning framework is native libraries as shared library packaged in the .jar file.

At runtime, those shared library will be unpacked and stored into cache directory. By default the cache directory is in user’s home directory. This is blocked by restrict confinement.

Even I can store those files into /tmp directory, I’m still not able to load those shared libraries with:

System.load()

It looks like using classic is the only option.

Access to the user’s home directory is available when plugging the home interface - as such I don’t see why classic confinement should be needed in this case.

I have home plugin defined in snapcraft.yaml file:

I still get I AccessDeniedException in the code when I trying to create folder under home directory. If I change to use /tmp, I’m able to write the file.

Based on the document, home plugin allows access to non-hidden files, the library this APP depends on is trying to create ~/.djl.ai folder, which is hidden.

But even I can write to home, I’m still not able to call System.load(), I believe there is security manager that preventing this operation.

So to access a hidden folder in home you also then need to use the personal-files interface - so you could define a personal-files plug called dot-djl-ai with write permission to $HOME/.dji.ai and then after installing the snap and manually connecting this interface your app should be able to write to this folder.

You can then request auto-connection of this so that when the snap is installed it is automatically connected and thus you should not require classic confinement.

A few questions:

  1. How should I resolve System.load() issue? On a GPU machine, the shared library is linked against CUDA driver, so it will try to load CUDA driver as well.
  2. with personal-files plugin, can I have write access to nested folders under ~/.djl.ai folder? The app only know the folder structure under the folder at runtime

Hey @deepjavalibrary,

Right, so in this case, djlbench is the clear owner of the ~/.djl.ai folder so you will be able to request such access and get it approved as per The personal-files interface. Please update your snap declaration and let us know so we can proceed with the process.

Could you please share the denials you are getting when trying to load such libraries? You can use snappy-debug to help troubleshoot.

CUDA support requires two main bits the above snapcraft.yaml doesn’t have

  1. You need to map the host driver into the snap, the gnome-extensions can handle this bit (even if it’s a CLI app that doesn’t spawn a GUI window at all).
  2. You need to ship (at a minimum) cudart.so yourself in your snap, but there might be some extras to consider.

So it’s important then to understand the CUDA EULA, which says which bits you can and can’t distribute, luckily cudart.so is on the distributable list!

To satisfy 1), just add extensions: [gnome-3-34] under the app section, e.g

apps:
  djlbench:
    command: benchmark-$SNAPCRAFT_PROJECT_VERSION/bin/benchmark
    extensions: [gnome-3-34]
...

To satisfy 2, consider adding the CUDA repositories to the snapcraft.yaml and simply add cudart.so as a dependency. Taken from a core 20 example (for core 18 you’d need to adapt the URL and the key-id)

package-repositories:
  - type: apt
    url: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/
    key-id: AE09FE4BBD223A84B2CCFCE3F60F4B3D7FA2AF80
...
stage-packages:
  - cuda-libraries-11-3

Keeping in mind that you likely only need cudart.so, you might be able to find a more specific package than cuda-libraries-major-minor.

Due to the nature of NVidia drivers and CUDA, you’d want to pick the lowest version of libcudart that your application can get away with, since it will indirectly set the minumum required NVidia driver version to run it.

I tried to add personal-files plugin, I still run into exception:

java.lang.UnsatisfiedLinkError: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so: failed to map segment from shared object
	at java.lang.ClassLoader$NativeLibrary.load0(Native Method) ~[?:?]
	at java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442) ~[?:?]
	at java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498) ~[?:?]
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694) ~[?:?]
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:2627) ~[?:?]
	at java.lang.Runtime.load0(Runtime.java:768) ~[?:?]
	at java.lang.System.load(System.java:1837) ~[?:?]

Here is the output from snappy-debug:

= AppArmor =
Time: Jun 29 20:43:24
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/mountinfo" pid=11648 comm="java" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/11648/mountinfo (read)
Suggestions:
* adjust program to not access '@{PROC}/@{pid}/mountinfo'
* add 'mount-observe' to 'plugs'

= AppArmor =
Time: Jun 29 20:43:24
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/coredump_filter" pid=11648 comm="java" requested_mask="wr" denied_mask="wr" fsuid=1000 ouid=1000
File: /proc/11648/coredump_filter (write)
Suggestion:
* adjust program to not access '@{PROC}/@{pid}/coredump_filter'

= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="open" profile="snap.djlbench.djlbench" name="/proc/11648/mountinfo" pid=11648 comm="java" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/11648/mountinfo (read)
Suggestions:
* adjust program to not access '@{PROC}/@{pid}/mountinfo'
* add 'mount-observe' to 'plugs'

= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="file_mmap" profile="snap.djlbench.djlbench" name="/home/ubuntu/.cache/JNA/temp/jna15433788716430075344.tmp" pid=11648 comm="java" requested_mask="m" denied_mask="m" fsuid=1000 ouid=1000
File: /home/ubuntu/.cache/JNA/temp/jna15433788716430075344.tmp (mmap)
Suggestion:
* add 'personal-files (see https://forum.snapcraft.io/t/the-personal-files-interface for acceptance criteria)' to 'plugs'

= AppArmor =
Time: Jun 29 20:43:25
Log: apparmor="DENIED" operation="file_mmap" profile="snap.djlbench.djlbench" name="/home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so" pid=11648 comm="java" requested_mask="m" denied_mask="m" fsuid=1000 ouid=1000
File: /home/ubuntu/.djl.ai/pytorch/1.8.1-cpu-linux-x86_64/0.12.0-cpu-libdjl_torch.so (mmap)
Suggestion:
* add 'personal-files (see https://forum.snapcraft.io/t/the-personal-files-interface for acceptance criteria)' to 'plugs'

We don’t bundle CUDA library, we rely on system installed CUDA.
We detect CUDA at runtime by trying to load libcudart.so file from system.

If we are using strict mode, doesn’t that means the detection will fail silently?

Strict snaps live in a mount namespace, they don’t usually see the hosts system libraries and it’s generally an antipattern to rely on them, so looking for cudart in the normal locations will certainly fail, whether it’s silent or not I wouldn’t know, that depends on your own code.

You could possibly use system-backup or system-files interfaces to try and access the host libraries explicitly, but by shipping cudart.so yourself, you end up not having to rely on the host libraries at all. Assuming the user already has NVidia drivers installed that support the cudart.so you ship, then they wouldn’t have to install CUDA system wide at all.

We won’t be able to bundle the CUDA library in our tool.

  1. The CUDA and cudnn library are huge and our library jar is only a few mb
  2. We support wide range of CUDA version, it doesn’t make sense to bundle all versions
  3. CUDA is optional in our tool, if user only want to benchmark on CPU, they don’t need CUDA at all.
  4. We don’t know which version of CUDA user want to run benchmark. We assume user installed version is the one. This way we don’t need prompt user to select a CUDA version.

I can agree with the size complaints, even with compression snap provides, CUDA does bloat downloads. On the other hand, it’s Canonical paying the bill ;). There were previous discussions about resolving this by the use of content snaps to help deduplication but as far as I’m aware this hasn’t gotten anywhere yet.

What stands out to me most there is CuDNN, as far as I know of NVidia’s current licensing (not a lawyer), CuDNN cannot be redistributed which ruins the ability to ship these libraries inside the snap. Legally, the user needs to acquire this themselves.

I would still suggest looking into the system-files/system-backup interfaces. Since these can expose the host system, you could potentially use the host CUDA libs in a strict snap. You’d need to add them in $LD_LIBRARY_PATH (and also keep in mind the gnome-extension, which will handle some other environment issues, but you could replace the gnome-extension with something smaller too if preferred).

So my hopefully helpful opinion to the reviewers here (keep in mind I’m not a reviewer) is that the requirement for CuDNN in particular is a pretty severe limitation in the strict model, and at a minimum to be functional for users who’d want that functionality, system-backup or system-files would be needed.

Thanks for the advice.

The more I look into this, the more I believe we have to use classic mode:

  1. We are a wrapper on top of other DeepLearning frameworks: PyTorch, TensorFlow, MXNet …, it’s would very hard for us to understand how each framework access the system. So it very hard to list all the path of library the tool will need for different linux distribution.
  2. We allows user to load custom shared library at runtime (custom operators for different hardware accelerator, like AWS Infererentia chip). We don’t really know where those library will be installed.
  3. As a performance benchmark tool, we monitor CPU and memory utilization, not clear to me if that will hit any permission issue.

@alexmurray

In # Process for reviewing classic confinement snaps document listed one of criteria that might requires classic:

running arbitrary command (esp if user-configurable such as a developer tool to organize dev environments)

We have many use cases that allows user to load arbitrary .so file:

  1. Load external libtorch.so/libmxnet.so/libtensorflow.so etc to benchmark user custom build of deeplearning framework.
  2. Load external hardware accelerator driver, like AWS Inferencia Chip, or AWS Elastic Inference Accelerator.
  3. Load extra shared library to support custom operators that being used by the model. Both PyTorch and MXNet support custom operators.

Would you please approve classic request based on above use case?

@pedronis, can you please analyze this request?

Thanks!

On further review, I think djlbench more closely fits within the debug tools category for classic confinement - ie. it is used by developers to benchmark their machine learning models which is pretty close to the activity of debugging. Also it does require access to arbitrary files / libraries already on the system. As such, I think it meets the criteria for classic confinement.

@advocacy can you please perform publisher vetting?

I agree with this analysis. We do want in the future to work to have ways for snaps to access transparently this kind of libraries from the host or from snaps, at which point the situation could be reconsidered.

@deepjavalibrary Is there an official domain/page for the library I could check please?

Here is the information

Our website: https://djl.ai
Out main github repo: https://github.com/deepjavalibrary/djl
The folder for djl-bench: https://github.com/deepjavalibrary/djl/tree/master/extensions/benchmark

Thanks for the info, can I also ask you to pm me the official contact email for djl.ai please?