Interface required for Glib thread scheduler?

I’ve configured my wpe-webkit-mir-kiosk snap to restart the WPE web process on failures. However, whenever the renderer process crashes, the browser fails to actually reload the page and shows a default crash message to the user. This is not ideal for use cases without a direct input device to trigger a reload.

I could reproduce the crash and when running WPE/cog with G_MESSAGES_DEBUG=all LIBGL_DEBUG=verbose and WAYLAND_DEBUG=1, I see the following error messages around the time of a crash:

Failed to set thread scheduler attributes: Operation not permitted

Seems to originate from GLib https://gitlab.gnome.org/GNOME/glib/-/blob/master/glib/gthread-posix.c#L1221 However, sudo journalctl -n5000000 | grep "apparmor" shows no relevant AppArmor denials for wpe-webkit-mir-kiosk.

A second error seems unrelated and possibly noise from enabling LIBGL_DEBUG:

libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /root/snap/wpe-webkit-mir-kiosk/41/.drirc: No such file or directory.

Does the GLib thread scheduler call require any snapd interfaces, or should I file this with WPE developers?

Any hint appreciated :slightly_smiling_face:

(ping @jdstrand whose interface knowledge helped me in the past :vulcan_salute: )

are there any seccomp denials? What does

journalctl --no-pager | grep "syscall=" 

show?

Here‘s the output:

audit[3830]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3830 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4b37e62 code=0x50000
 kernel: audit: type=1326 audit(1599826909.514:208928): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=3830 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4b37e62 code=0x50000

WPEWebProcess is indeed the process responsible, but I don’t know what to make of the rest :sweat_smile:

ogra@pi4:~$ snappy-debug.scmp-sys-resolver 380
sched_setattr
ogra@pi4:~$

looks like it wants to mangle the scheduler … (just to state the obvious :wink: )

Hmm, seems the only interface that allows this is docker-support, but it’s unclear to me that this seccomp denial is the cause of the issue, when you reproduce the issue, do you see that seccomp denial show up or was that denial from a “long time ago”?

The crash takes some time to reproduce, will post updates here. The last entry was around the time the crash occurred, though.

1 Like

While the crash still has to reoccur on the initial device, I checked the logs on a secondary one – same entry as well as additional SECCOMP entries with syscall 380 from earlier crashes. The time stamps should be around the time where the WPE renderer process crashed.

Sep 09 12:42:21 glancr audit[460]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=460 comm="cog" exe="/snap/wpe-webkit-mir-kiosk/41/bin/cog" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4932e62 code=0x50000
Sep 09 12:42:21 glancr kernel: audit: type=1326 audit(1599648141.047:175): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=460 comm="cog" exe="/snap/wpe-webkit-mir-kiosk/41/bin/cog" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4932e62 code=0x50000
Sep 09 12:42:25 glancr audit[476]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=476 comm="WPENetworkProce" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPENetworkProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4bebe62 code=0x50000
Sep 09 12:42:25 glancr kernel: audit: type=1326 audit(1599648145.239:176): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=476 comm="WPENetworkProce" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPENetworkProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4bebe62 code=0x50000
Sep 09 12:42:26 glancr audit[475]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=475 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4b79e62 code=0x50000
Sep 09 12:42:26 glancr kernel: audit: type=1326 audit(1599648146.335:177): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=475 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4b79e62 code=0x50000
Sep 09 12:42:26 glancr audit[502]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=502 comm="gst-plugin-scan" exe="/snap/wpe-webkit-mir-kiosk/41/usr/lib/arm-linux-gnueabihf/gstreamer1.0/gstreamer-1.0/gst-plugin-scanner" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb6d37e62 code=0x50000
Sep 09 12:42:26 glancr kernel: audit: type=1326 audit(1599648146.459:178): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=502 comm="gst-plugin-scan" exe="/snap/wpe-webkit-mir-kiosk/41/usr/lib/arm-linux-gnueabihf/gstreamer1.0/gstreamer-1.0/gst-plugin-scanner" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb6d37e62 code=0x50000
Sep 09 20:29:35 glancr audit[26781]: SECCOMP auid=4294967295 uid=0 gid=0 ses=4294967295 pid=26781 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4bbee62 code=0x50000
Sep 09 20:29:35 glancr kernel: audit: type=1326 audit(1599676175.226:181): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=26781 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 syscall=380 compat=0 ip=0xb4bbee62 code=0x50000

Temporarily, can you try adding the plug docker-support to your application and connect it manually (along with the other interfaces you may have connected through store auto-connects), and see if you can still reproduce? if so then we will need to look into how to provide your application access to the sched_setattr syscall, because presumably your application is not docker :smile:

3 Likes

IME, sched_setattr should be added to the process-control interface.

4 Likes

I can propose something like this independently of whether this ends up fixing the bug for @tobias unless you already have this on your queue of updates

Sorr for the delay, but while the crash was reproducible just this morning, it has yet to reoccur with the try-mode snap where I added docker-support without connecting it yet. So basically the same snap as before, but with an unconnected interface and in try mode ¯_(ツ)_/¯ Re: @jdstrand 's post: process-control is exactly the interface which I suspected to allow this syscall. :+1:

The crash occurred again late yesterday evening, again with SECCOMP audits for syscall sched_setattr (380) in the log. I’ve connected the docker-support interface and restarted the service, now tailing the log, we’ll see if this fixes the issue.

UPDATE

The crash finally occurred again, this time with docker-support connected:

glancr@glancr:~$ snap connections wpe-webkit-mir-kiosk
Interface         Plug                                   Slot                              Notes
avahi-observe     wpe-webkit-mir-kiosk:avahi-observe     -                                 -
dbus              mirros-one:dbus-cogctl                 wpe-webkit-mir-kiosk:dbus-cogctl  -
docker-support    wpe-webkit-mir-kiosk:docker-support    :docker-support                   manual
hostname-control  wpe-webkit-mir-kiosk:hostname-control  :hostname-control                 gadget
network           wpe-webkit-mir-kiosk:network           :network                          -
network-bind      wpe-webkit-mir-kiosk:network-bind      :network-bind                     -
network-manager   wpe-webkit-mir-kiosk:network-manager   network-manager:service           gadget
opengl            wpe-webkit-mir-kiosk:opengl            :opengl                           -
process-control   wpe-webkit-mir-kiosk:process-control   :process-control                  gadget
upower-observe    wpe-webkit-mir-kiosk:upower-observe    -                                 -
wayland           wpe-webkit-mir-kiosk:wayland           mir-kiosk:wayland                 -

Sadly, WPE still refuses to reload the page and shows at the generic “the renderer process crashed” page.

sudo journalctl --output=short --all | sudo snappy-debug gives:

= Seccomp =
Time: Sep 15 13:59:14
Log: auid=4294967295 uid=0 gid=0 ses=4294967295 pid=11708 comm="WPEWebProcess" exe="/usr/lib/arm-linux-gnueabihf/wpe-webkit-1.0/WPEWebProcess" sig=0 arch=40000028 380(sched_setattr) compat=0 ip=0xb4bbfe62 code=0x50000
Syscall: sched_setattr

I’m not sure if this is just an info that the WPEWebProcess ran a sys call, which should now be permitted by the docker-support interface, or if this means that WPE still can’t run scheduler_setattr.

Any guidance @ijohnson?

no that message means it was really denied, but it’s odd that it was denied and you have docker-support connected, did you add the plug to all the apps that are in your snap?

are you sure that everything gets cleaned up on restart of the snap ? perhaps there is some leftover process from the former run ?

I did add it to a list, but it is not a list I’m going to be able to work on anytime soon. If you submit a PR, I can review it.

Here you go! :slight_smile: https://github.com/snapcore/snapd/pull/9357

In my initial test, I just unsquashed a copy of the candidate snap on-device (Raspberry Pi), added docker-support to the browser service’s plugs stanza in meta/snap.yaml and loaded it with snap try. Launchpad remote-build took seemed stuck yesterday, and trying @ogra’s fabrica image is still on the todo list … :wink:
Per the snapcraft.yaml of the current candidate revision, my snap only has this service and a second service that restarts the browser when the Wayland socket is deleted.

This morning, I added the plug modifications shown below, ran a fresh remote build and loaded the resulting snap onto a Pi. AFAIK, the global plugs definition shouldn’t be required for services to work?

Expand plug modifications to snapcraft.yaml
plugs:
  docker-support:
    privileged-containers: true

apps:
  browser:
    command: bin/desktop-launch $SNAP/bin/launch-wpe
    daemon: simple
    restart-condition: always
    slots: [dbus-cogctl]
    plugs:
      # Auto-connected
      - wayland
      - opengl # required for libEGL to work
      - network
      - network-bind # Remote inspector
      - upower-observe
      # Manually connected, show up as AppArmor denials but
      # basic browsing seems to work fine without
      - avahi-observe # zeroconf name resolution
      # snappy-debug suggestions
      - network-manager
      - hostname-control
      - process-control
      # - browser-support # TODO: Use this if/when we can get rid of preload/desktop-launch
      - docker-support
  restart-watcher:
    command: bin/watcher.sh
    daemon: simple
    plugs:
      - docker-support

Maybe the quick&dirty snap try approach doesn’t apply the change properly?

P.S.: I’m testing in parallel on two different Raspberry Pi Core installations and an amd64 PC with Core running. On the PC, the current WPE snap has yet to crash at all (showing the exact same page), so I focus testing on the Pi devices. That’s why I can’t build on my amd64 host but have to rely on either LP builders or another local armhf builder :smile:

if your existing installation of the snap was already a snap try, then just modifying the plugs of the snap.yaml file after you ran snap try will not be sufficient, but if you run snap try after the change to snap.yaml, then it is certainly picked up. You can confirm whether it is in the profile for your daemon or not definitively by running

grep sched_setattr /var/lib/snapd/seccomp/bpf/snap.$SNAP_NAME.$APP_NAME.src

Also we did just land the sched_setattr change to process-control so in a few hours you can switch to just using process-control by refreshing to the edge channel of snapd.

I think it was a locally installed snap (snap install --dangerous locally_built_file.snap), and I definitely ran snap try after the changes to snap.yaml. In any case, after installing the rebuilt test snap has sched_setattr in its seccomp profile. Now I wait for the crash to reoccur :slight_smile:

Yeah, saw the PR – thank you :+1:

1 Like

FYI @ijohnson, @ogra : I finally had some time to come back to this issue; turns out that having sched_setattr helps but the remaining issue comes from a bug in cog (the web view container). It restarts the WPEWebProcess after a crash, but doesn’t properly reload the page. See https://github.com/Igalia/cog/issues/230

2 Likes