I only rarely reboot my desktop (usually only every few months)
after i did reboot today with the beta core snap in use i suddenly got the error messages below … there was nothing in journalctl, syslog or dmesg when this mesage got printed, none of the snaps i tried were able to run…
i tried to switch core to the stable channel but the issue persisted, after another reboot it was gone though and i can not reproduce it anymore …
I am writing this down so that if someone else hits it we can perhaps get more debug info somehow, it smells like there is some kind of race here with udev tagging …
/**
* udev_enumerate_scan_devices:
* @udev_enumerate: udev enumeration context
*
* Scan /sys for all devices which match the given filters. No matches
* will return all currently available devices.
*
* Returns: 0 on success, otherwise a negative error value.
**/
_public_ int udev_enumerate_scan_devices(struct udev_enumerate *udev_enumerate) {
assert_return(udev_enumerate, -EINVAL);
return device_enumerator_scan_devices(udev_enumerate->enumerator);
}
Then
int device_enumerator_scan_devices(sd_device_enumerator *enumerator) {
sd_device *device;
int r = 0, k;
assert(enumerator);
if (enumerator->scan_uptodate &&
enumerator->type == DEVICE_ENUMERATION_TYPE_DEVICES)
return 0;
while ((device = prioq_pop(enumerator->devices)))
sd_device_unref(device);
if (!set_isempty(enumerator->match_tag)) {
k = enumerator_scan_devices_tags(enumerator);
if (k < 0)
r = k;
} else if (enumerator->match_parent) {
k = enumerator_scan_devices_children(enumerator);
if (k < 0)
r = k;
} else {
k = enumerator_scan_devices_all(enumerator);
if (k < 0)
r = k;
}
enumerator->scan_uptodate = true;
return r;
}
Either of the three enumerator_scan_devices_... may fail.
Ogra do you have any logs from systemd from that time?
udev_enumerate_scan_devices() is from libudev. snap-confine isn’t doing anything weird with it and I cannot reproduce. I suspect either the system was in a sad state where /sys/{bus,class,subsystem} weren’t available or there is a bug in systemd (it provides udev). You didn’t provide enough details, but I do see https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1713536 as a fix for xenial-proposed and current artful that affects udev. I suggest filing a bug in Ubuntu against systemd with (more) steps to reproduce.
@zyga-snapd i ran tail -f /var/log/syslog, jounalctl -f and dmesg -w in three different terminals when the issue was showing up and there was no movement in either of them when trying to run any snap … so sadly there is no log info at all …
it could well be a udev internal issue or a kernel one or actually the one that @jdstrand pointed to. given that it is not reproducable anymore i can’t really tell. i only left this post here in case anyone else hits the same error so we could have a chance to check for more info while the system is actually in that state …
I can confirm that it affects 16.04 but only installing the nvidia proprietary driver (in this case version 340.102).
I’ll spend a moment with gdb and strace to find the problem.
I’m having the same issue on > 20 systems (part of our CI system for frr). It started approx 3 days ago.
Tried to upgrade to 2.28.4, but does not make a difference.
The interesting part I can see is that I only see after I connect the needed interfaces for my application
(frr snap is the one I’m supporting).
root@ci-comp2-dut:~# snap install --beta frr
frr (beta) 3.0-rc2 from 'osr' installed
root@ci-comp2-dut:~# frr.vtysh
Hello, this is FRRouting (version 3.0-rc2).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
ci-comp2-dut# exit
root@ci-comp2-dut:~# snap connect frr:network-control core:network-control
root@ci-comp2-dut:~# frr.vtysh
udev_enumerate_scan failed
root@ci-comp2-dut:~#
This is Ubuntu 16.04 with hwe kernel, updated to latest (It started to fail before, but I’ve tried to upgrade to latest packages etc trying to solve the issue without any success)
All of my Ubuntu 16.04 are consistent with the error - every time and reboot etc doesn’t seem to work.
(I can provide access to systems with this issue if there is any interest)
@mwinter - When looking at the nvidia issue I noticed network-control was also affected and this is fixed in https://github.com/snapcore/snapd/pull/4031 which will be in 2.28.5 (hopefully available very soon in the beta channel).
Great. I hope this is release soon.
And I think this might be the right fix. Looking at the pull request, I’ve noticed the discussion on tun/tap interfaces and my system have a tap0 interface. For a test, I’ve deleted the tap0 and this fixes the issue
@mwinter - until the fix is out, you can either revert to 2.27.6 of the snap or you can edit /etc/udev/rules.d/70-snap.frr.rules and remove the following lines:
We pushed a new core to the beta channel that should fix this error. We tracked it down to an issue with the nvidia module and udev/sysfs. Please try snap refresh --beta core if you see this issue. After the refresh it should go away.