Weird udev_enumerate error

I only rarely reboot my desktop (usually only every few months)
after i did reboot today with the beta core snap in use i suddenly got the error messages below … there was nothing in journalctl, syslog or dmesg when this mesage got printed, none of the snaps i tried were able to run…

ogra@anubis:~$ snapcraft-forum 
udev_enumerate_scan failed
ogra@anubis:~$ rocketchat-desktop 
udev_enumerate_scan failed

i tried to switch core to the stable channel but the issue persisted, after another reboot it was gone though and i can not reproduce it anymore …

I am writing this down so that if someone else hits it we can perhaps get more debug info somehow, it smells like there is some kind of race here with udev tagging …

@zyga-snapd @jdstrand Any ideas here?

Hmm, this is in udev-support.c

    if (udev_enumerate_scan_devices(udev_s->devices) != 0)
        die("udev_enumerate_scan failed");

I’ll look at how that can fail in udev itself.

So I see this:

/**
 * udev_enumerate_scan_devices:
 * @udev_enumerate: udev enumeration context
 *
 * Scan /sys for all devices which match the given filters. No matches
 * will return all currently available devices.
 *
 * Returns: 0 on success, otherwise a negative error value.
 **/
_public_ int udev_enumerate_scan_devices(struct udev_enumerate *udev_enumerate) {
        assert_return(udev_enumerate, -EINVAL);

        return device_enumerator_scan_devices(udev_enumerate->enumerator);
}

Then

int device_enumerator_scan_devices(sd_device_enumerator *enumerator) {
        sd_device *device;
        int r = 0, k;

        assert(enumerator);

        if (enumerator->scan_uptodate &&
            enumerator->type == DEVICE_ENUMERATION_TYPE_DEVICES)
                return 0;

        while ((device = prioq_pop(enumerator->devices)))
                sd_device_unref(device);

        if (!set_isempty(enumerator->match_tag)) {
                k = enumerator_scan_devices_tags(enumerator);
                if (k < 0)
                        r = k;
        } else if (enumerator->match_parent) {
                k = enumerator_scan_devices_children(enumerator);
                if (k < 0)
                        r = k;
        } else {
                k = enumerator_scan_devices_all(enumerator);
                if (k < 0)
                        r = k;
        }

        enumerator->scan_uptodate = true;

        return r;
}

Either of the three enumerator_scan_devices_... may fail.

Ogra do you have any logs from systemd from that time?

udev_enumerate_scan_devices() is from libudev. snap-confine isn’t doing anything weird with it and I cannot reproduce. I suspect either the system was in a sad state where /sys/{bus,class,subsystem} weren’t available or there is a bug in systemd (it provides udev). You didn’t provide enough details, but I do see https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1713536 as a fix for xenial-proposed and current artful that affects udev. I suggest filing a bug in Ubuntu against systemd with (more) steps to reproduce.

@zyga-snapd i ran tail -f /var/log/syslog, jounalctl -f and dmesg -w in three different terminals when the issue was showing up and there was no movement in either of them when trying to run any snap … so sadly there is no log info at all …

it could well be a udev internal issue or a kernel one or actually the one that @jdstrand pointed to. given that it is not reproducable anymore i can’t really tell. i only left this post here in case anyone else hits the same error so we could have a chance to check for more info while the system is actually in that state …

1 Like

I got this today with irccloud-desktop, after rebooting my laptop for the first time in a week or so.

alan@hal:~$ irccloud-desktop 
udev_enumerate_scan failed

I can use irccloud in a browser tab so will do that, and keep my laptop in its current state so if someone needs me to debug anything I can.

1 Like

We are now investigating this on a Ubuntu 16.04 machine kindly provided by @popey. I’ll update the thread once we have more news.

looks like @papibe reported the same error in the “call for testing GIMP” thread:

1 Like

I have now reproduced the issue.

I can confirm that it affects 16.04 but only installing the nvidia proprietary driver (in this case version 340.102).
I’ll spend a moment with gdb and strace to find the problem.

I think we understand the issue now.

The snapd 2.28.4 release should fix this particular problem.

You can try it out by refreshing to the beta channel of the core snap:

sudo snap refresh core --beta

Please report any issues you find while running beta on the forum!

I’m having the same issue on > 20 systems (part of our CI system for frr). It started approx 3 days ago.
Tried to upgrade to 2.28.4, but does not make a difference.

The interesting part I can see is that I only see after I connect the needed interfaces for my application
(frr snap is the one I’m supporting).

root@ci-comp2-dut:~# snap install --beta frr
frr (beta) 3.0-rc2 from 'osr' installed
root@ci-comp2-dut:~# frr.vtysh

Hello, this is FRRouting (version 3.0-rc2).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

ci-comp2-dut# exit
root@ci-comp2-dut:~# snap connect frr:network-control core:network-control 
root@ci-comp2-dut:~# frr.vtysh
udev_enumerate_scan failed
root@ci-comp2-dut:~# 

This is Ubuntu 16.04 with hwe kernel, updated to latest (It started to fail before, but I’ve tried to upgrade to latest packages etc trying to solve the issue without any success)
All of my Ubuntu 16.04 are consistent with the error - every time and reboot etc doesn’t seem to work.

(I can provide access to systems with this issue if there is any interest)

  • Martin

@mwinter - When looking at the nvidia issue I noticed network-control was also affected and this is fixed in https://github.com/snapcore/snapd/pull/4031 which will be in 2.28.5 (hopefully available very soon in the beta channel).

Great. I hope this is release soon.
And I think this might be the right fix. Looking at the pull request, I’ve noticed the discussion on tun/tap interfaces and my system have a tap0 interface. For a test, I’ve deleted the tap0 and this fixes the issue

An annoying side effect of the force upgrade policy on snap’s ( See Disabling automatic refresh for snap from store thread - I still wish a manual lock to version would be possible and I could lock the core as well )

  • Martin

@mwinter - until the fix is out, you can either revert to 2.27.6 of the snap or you can edit /etc/udev/rules.d/70-snap.frr.rules and remove the following lines:

KERNEL=="tap[0-9]*", TAG+="snap_frr_vtysh"
KERNEL=="tun[0-9]*", TAG+="snap_frr_vtysh"

Then run:

$ sudo udevadm trigger

We apologize for the inconvenience.

I might also mention that we’ll be adding regression tests for this so it doesn’t happen again going forward.

I think you may also need udevadm control --reload-rules before the trigger command.

We also found that:

  1. snapd doesn’t refresh udev backend on startup (this is now changed in release/2.28 branch)
  2. udev doesn’t remove tags from /run/udev/snap_*/*nvidia* (at least on xenial, it doesn’t seem to affect artful)

1 is already fixed in master and in the release branch, 2 is under review.

We pushed a new core to the beta channel that should fix this error. We tracked it down to an issue with the nvidia module and udev/sysfs. Please try snap refresh --beta core if you see this issue. After the refresh it should go away.

2 Likes

I have been testing 2.28.5 (via core in beta channel) and it’s resolved the issues for me. Thanks for fixing these tricky issues.

2 Likes