I have successfully run the Hiri snap for many months, suddenly this morning (possibly due to updates, I didn’t open Hiri before I ran them) Hiri no longer starts and I see apparmor denials from dmesg.
System Details:
Ubuntu 17.04
Snap list output:
core 16-2.28.1 3017 canonical core
hiri 1.2.3.0 15 hiri -
@eugeneduvenage We are contacting the Hiri developers regarding the nvidia related denials. As for the denial related to /proc/*/mounts/ you can resolve that by manually connecting the following interfaces:
I have triple checked that those nvidia denials are only seen with dmesg when hiri starts.
I noticed there are some more logs if I run hiri from the command line
13:31 $ hiri
[MainProcess MainThread] DEBUG 2017-10-11 13:32:13,163 hiri.store - Migrating DB schema
[MainProcess MainThread] INFO 2017-10-11 13:32:13,166 alembic.runtime.migration - Context impl SQLiteImpl.
[MainProcess MainThread] INFO 2017-10-11 13:32:13,166 alembic.runtime.migration - Will assume non-transactional DDL.
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
[MainProcess MainThread] QML 2017-10-11 13:32:13,288 qml - Unrecognized OpenGL version
[MainProcess MainThread] QML 2017-10-11 13:32:13,289 qml - Unrecognized OpenGL version
I have reverted my core snap back to 2.27 and hiri works.
We also tested a completely different opengl snap on 2.28 and it’s similarly broken (segfaults).
Reverting that machine to snapd 2.27 (via reverting core snap) makes that application work again.
So seems limited to nvidia machines running opengl enabled snaps.
I confirmed that we are not adding the nvidia device nodes at all. On @popey’s machine we have the following entries in devices.list
c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 226:0 rwm
The major number of nvidia devices is 195 (as seen in Documentation/admin-guide/devices.txt in the linux kernel tree). I think the code that is adding them is faulty. Looking at it now.
Quite a bit of conversation happened on IRC and I’ll add a quick note here on behalf of those involved. The bottom line is that 2 PRs needed to land in 2.28 for a change related to the opengl interface to work correctly, but only the first one did. A fix is undergoing QA now and should be included in the 2.28.4 core image and will be pushed to stable when done. We’ll be conducting a post-mortem to improve our processes to avoid this sort of problem in the future.
Thanks for the quick feedback @eugeneduvenage ! What version of the nvidia driver are you using? I.e. what is the output of: apt list --installed nvidia-* ? This is on Ubuntu 16.04, correct? I will try to reproduce. Fwiw, I can run hiri here on my 17.10 machine with nvidia-340 installed. Happy to try more combinations.
I wasn’t able to downgrade to nvidia-340 but I did go down to nvidia-375 but I get the same issue.
If I install Hiri from tarball the app works 100% on both nvidia-384 and 375, so it must be snap related in some way.
I get a similar bug with nvidia-384 - [ 398.778345] QSGRenderThread[6268]: segfault at 0 ip 00007fc55b0602a8 sp 00007fc54a14be90 error 4 in libnvidia-glcore.so.384.90[7fc55a3a7000+149b000] . Downgrading to 2.27.6 does not help unfortunately, I’m digging into it now.