AppArmor denials for Hiri after updates

I have successfully run the Hiri snap for many months, suddenly this morning (possibly due to updates, I didn’t open Hiri before I ran them) Hiri no longer starts and I see apparmor denials from dmesg.

System Details:
Ubuntu 17.04
Snap list output:
core 16-2.28.1 3017 canonical core
hiri 1.2.3.0 15 hiri -

DMESG output:
[ 2879.277628] audit: type=1400 audit(1507710260.432:1680): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/bus/pci/drivers/nvidia/uevent" pid=13789 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[ 2879.282111] audit: type=1400 audit(1507710260.436:1681): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia/uevent" pid=13789 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[ 2879.283384] audit: type=1400 audit(1507710260.436:1682): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_drm/uevent" pid=13789 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[ 2879.284868] audit: type=1400 audit(1507710260.440:1683): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_modeset/uevent" pid=13789 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[ 2879.286002] audit: type=1400 audit(1507710260.440:1684): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_uvm/uevent" pid=13789 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[ 2879.657232] audit: type=1400 audit(1507710260.812:1685): apparmor=“DENIED” operation=“open” profile=“snap.hiri.hiri” name="/proc/13876/mounts" pid=13876 comm=“hirimain” requested_mask=“r” denied_mask=“r” fsuid=1000 ouid=1000

@eugeneduvenage We are contacting the Hiri developers regarding the nvidia related denials. As for the denial related to /proc/*/mounts/ you can resolve that by manually connecting the following interfaces:

sudo snap connect hiri:mount-observe core:mount-observe
sudo snap connect hiri:removable-media core:removable-media

I will raise a request to have those interfaces auto connected for Hiri.

I’m looking into a fix for snapd to correct those Nvidia denials

To be clear, this is not just apparmor denials. The application coredumps.

[MainProcess Worker-2 ] INFO 2017-10-11 11:28:41,541 hiri.services.credentials - Connecting with default creds
[MainProcess Reader-1 ] DEBUG 2017-10-11 11:28:41,558 hiri.util.logging - GenericMailboxQueryBuilder All: 36.052 ms
[MainProcess MainThread] QML 2017-10-11 11:28:41,598 qml - Failed to create OpenGL context for format QSurfaceFormat(version 2.0, options QFlags<QSurfaceFormat::FormatOption>(), depthBufferSize 24, redBufferSize -1, greenBufferSize -1, blueBufferSize -1, alphaBufferSize -1, stencilBufferSize 8, samples -1, swapBehavior QSurfaceFormat::SwapBehavior(DoubleBuffer), swapInterval 1, profile QSurfaceFormat::OpenGLContextProfile(NoProfile))
/snap/hiri/15/hiri.sh: line 30: 21989 Aborted (core dumped) "$SNAP/hirimain" $@

I’m a bit puzzled by why this would be here. We don’t open /sys/module/nvidia/uevent from snap-confine at all.

I have triple checked that those nvidia denials are only seen with dmesg when hiri starts.

I noticed there are some more logs if I run hiri from the command line
13:31 $ hiri
[MainProcess MainThread] DEBUG 2017-10-11 13:32:13,163 hiri.store - Migrating DB schema
[MainProcess MainThread] INFO 2017-10-11 13:32:13,166 alembic.runtime.migration - Context impl SQLiteImpl.
[MainProcess MainThread] INFO 2017-10-11 13:32:13,166 alembic.runtime.migration - Will assume non-transactional DDL.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
[MainProcess MainThread] QML 2017-10-11 13:32:13,288 qml - Unrecognized OpenGL version
[MainProcess MainThread] QML 2017-10-11 13:32:13,289 qml - Unrecognized OpenGL version

The corresponding dmesg output:
[14150.944762] audit: type=1400 audit(1507721532.101:6416): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/bus/pci/drivers/nvidia/uevent" pid=11202 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[14150.947475] audit: type=1400 audit(1507721532.101:6417): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia/uevent" pid=11202 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[14150.948790] audit: type=1400 audit(1507721532.105:6418): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_drm/uevent" pid=11202 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[14150.949882] audit: type=1400 audit(1507721532.105:6419): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_modeset/uevent" pid=11202 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0
[14150.951297] audit: type=1400 audit(1507721532.105:6420): apparmor=“DENIED” operation=“open” profile="/snap/core/3017/usr/lib/snapd/snap-confine" name="/sys/module/nvidia_uvm/uevent" pid=11202 comm=“snap-confine” requested_mask=“r” denied_mask=“r” fsuid=0 ouid=0

I found this link on the Hiri support site https://support.hiri.com/hc/en-us/articles/115002954969-libGL-error-failed-to-load-driver-swrast-problem-starting-Hiri-on-Linux that mentions the command line error, the fix does not look like something I can apply when running the snap and the snap has been running without issue for a long time.

Eugene

I have reverted my core snap back to 2.27 and hiri works.
We also tested a completely different opengl snap on 2.28 and it’s similarly broken (segfaults).
Reverting that machine to snapd 2.27 (via reverting core snap) makes that application work again.
So seems limited to nvidia machines running opengl enabled snaps.

I suspect this is related to https://github.com/snapcore/snapd/commit/b17d5ba8d46c269dbaba08c0484a13e1e1c9e9e7

1 Like

I confirmed that we are not adding the nvidia device nodes at all. On @popey’s machine we have the following entries in devices.list

c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 226:0 rwm

The major number of nvidia devices is 195 (as seen in Documentation/admin-guide/devices.txt in the linux kernel tree). I think the code that is adding them is faulty. Looking at it now.

1 Like

Quite a bit of conversation happened on IRC and I’ll add a quick note here on behalf of those involved. The bottom line is that 2 PRs needed to land in 2.28 for a change related to the opengl interface to work correctly, but only the first one did. A fix is undergoing QA now and should be included in the 2.28.4 core image and will be pushed to stable when done. We’ll be conducting a post-mortem to improve our processes to avoid this sort of problem in the future.

We apologize for the inconvenience.

4 Likes

This is because the 2nd PR to fix nvidia didn’t make it into 2.28.

A fix for 2.28 was merged and it is currently in the “beta” channel. We expect it to go to candidate today after QA validation.

Hi,
firstly thanks for the amazing response to this issue.

I have installed 2.28 from the beta channel and confirmed that I no longer get any AppArmor denials for nvidia device nodes when starting Hiri.

Unfortunately it has not stopped Hiri from crashing with the same OpenGL error:

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
[MainProcess  MainThread] QML      2017-10-12 08:47:36,070   qml - Unrecognized OpenGL version
[MainProcess  MainThread] QML      2017-10-12 08:47:36,071   qml - Unrecognized OpenGL version

1 Like

Thanks for the quick feedback @eugeneduvenage ! What version of the nvidia driver are you using? I.e. what is the output of: apt list --installed nvidia-* ? This is on Ubuntu 16.04, correct? I will try to reproduce. Fwiw, I can run hiri here on my 17.10 machine with nvidia-340 installed. Happy to try more combinations.

Hi,

I am running Ubuntu 17:04 with nvidia 384.90.

Output of apt list --installed nvidia-* is:

nvidia-384/zesty,now 384.90-0ubuntu0~gpu17.04.1 amd64 [installed]
nvidia-opencl-icd-384/zesty,now 384.90-0ubuntu0~gpu17.04.1 amd64 [installed,automatic]
nvidia-prime/zesty,now 0.8.4 amd64 [installed,automatic]
nvidia-settings/zesty,now 384.90-0ubuntu0~gpu17.04.1 amd64 [installed]

Output of sudo snap list is:

Name  Version    Rev   Developer  Notes
core  16-2.28.4  3191  canonical  core
hiri  1.2.3.0    15    hiri       -

I removed all nvidia drivers, all snaps and snapd from my system and reinstalled.

Hiri seems to start and shows the splash screen at least and then crashes with the following on the command line:

[MainProcess  Dummy-3   ] QML      2017-10-12 13:17:59,889   qml - QOpenGLFramebufferObject: Unsupported framebuffer format.
[MainProcess  Dummy-3   ] QML      2017-10-12 13:17:59,889   qml - QOpenGLFramebufferObject: Framebuffer incomplete, missing attachment.

In dmesg I now see a segfault in libnvidia-glcore.so.384.90 :

[ 1034.436102] audit: type=1400 audit(1507807288.513:858): apparmor="DENIED" operation="open" profile="snap.hiri.hiri" name="/etc/" pid=9374 comm="hirimain" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[ 1035.112671] QSGRenderThread[9394]: segfault at 0 ip 00007f01eee3b2a8 sp 00007f01d1dbfe90 error 4 in libnvidia-glcore.so.384.90[7f01ee182000+149b000]

I will see if I can get nvidia-340 on my system to match yours and try again.

I wasn’t able to downgrade to nvidia-340 but I did go down to nvidia-375 but I get the same issue.
If I install Hiri from tarball the app works 100% on both nvidia-384 and 375, so it must be snap related in some way.

1 Like

Can you confirm that it still crashes if you install core from the beta channel? (sudo snap install --beta core). EDIT: ah, you did that :-//

I get a similar bug with nvidia-384 - [ 398.778345] QSGRenderThread[6268]: segfault at 0 ip 00007fc55b0602a8 sp 00007fc54a14be90 error 4 in libnvidia-glcore.so.384.90[7fc55a3a7000+149b000] . Downgrading to 2.27.6 does not help unfortunately, I’m digging into it now.

1 Like

Can you list what you see in /usr/lib/nvidia?

I can see /var/lib/snapd/lib/gl inside the confinement. It looks like this:

$ ls /var/lib/snapd/lib/gl
alt_ld.so.conf		       libnvidia-cfg.so.1
bin			       libnvidia-cfg.so.384.90
ld.so.conf		       libnvidia-compiler.so
libEGL.so		       libnvidia-compiler.so.1
libEGL.so.1		       libnvidia-compiler.so.384.90
libEGL.so.384.90	       libnvidia-egl-wayland.so.1.0.1
libEGL_nvidia.so.0	       libnvidia-eglcore.so.384.90
libEGL_nvidia.so.384.90        libnvidia-encode.so
libGL.so		       libnvidia-encode.so.1
libGL.so.1		       libnvidia-encode.so.384.90
libGL.so.1.0.0		       libnvidia-fatbinaryloader.so.384.90
libGLESv1_CM.so		       libnvidia-fbc.so
libGLESv1_CM.so.1	       libnvidia-fbc.so.1
libGLESv1_CM_nvidia.so.1       libnvidia-fbc.so.384.90
libGLESv1_CM_nvidia.so.384.90  libnvidia-glcore.so.384.90
libGLESv2.so		       libnvidia-glsi.so.384.90
libGLESv2.so.2		       libnvidia-ifr.so
libGLESv2_nvidia.so.2	       libnvidia-ifr.so.1
libGLESv2_nvidia.so.384.90     libnvidia-ifr.so.384.90
libGLX.so		       libnvidia-ml.so
libGLX.so.0		       libnvidia-ml.so.1
libGLX_indirect.so.0	       libnvidia-ml.so.384.90
libGLX_nvidia.so.0	       libnvidia-ptxjitcompiler.so.1
libGLX_nvidia.so.384.90        libnvidia-ptxjitcompiler.so.384.90
libGLdispatch.so.0	       libnvidia-tls.so.384.90
libOpenGL.so		       libnvidia-wfb.so.1
libOpenGL.so.0		       libnvidia-wfb.so.384.90
libnvcuvid.so		       tls
libnvcuvid.so.1		       vdpau
libnvcuvid.so.384.90	       xorg
libnvidia-cfg.so