Race condition between snapd and NM when providing permissions


#1

Cellular connection with a modem (using NM/MM snaps) fails on boot. There is a strange error from pppd about not being able to connect to the modem port. But there are no apparmor denials. It looks as some sort of race condition:

$ nmcli c add type gsm ifname ttyACM0 con-name pepe apn gprs.pepephone.com
$ sudo journalctl -u snap.network-manager.networkmanager.service --no-pager
...
Mar 12 08:10:54 localhost.localdomain pppd[3508]: pppd 2.4.7 started by root, uid 0
Mar 12 08:10:54 localhost.localdomain pppd[3508]: Failed to open /dev/ttyACM0: Operation not permitted
...

strace revels it is an EPERM error:

[pid 3515] open("/dev/ttyACM0", O_RDWR|O_NONBLOCK) = -1 EPERM (Operation not permitted)
[pid 3515] gettimeofday({1520845102, 49471}, NULL) = 0
[pid 3515] send(3<socket:[25315]>, "<147>Mar 12 08:58:22 pppd[3515]: Failed to open /dev/ttyACM0: Operation not permitted", 85, MSG_NOSIGNAL) = 85
[pid 3515] write(1<socket:[19920]>, "Failed to open /dev/ttyACM0: Operation not permitted", 52) = 52

And, surprisingly, doing something like:

$ sudo snap run --shell network-manager.networkmanager
root@localhost:/home/test# exit

Makes the error vanish, and in next connection re-try pppd is able to open the modem port. It looks like snap-confine loads something in the kernel that was missing when the NM daemon started.

The error also vanishes when:

  1. Restarting the NM daemon
  2. After introducing a delay of a few seconds before the NM daemon starts on boot

#2

Something interesting to keep in mind is that /dev/ttyACM0 takes a few seconds until it appears, as it is created by a separate daemon that sets some GPIO. And NM is probably starting before the device is created.


#3

It turns out that the reason for the EPERM error is device cgroup not being dynamic enough.

$ ls -l /dev/ttyACM*
crw-rw---- 1 root dialout 166, 0 Mar 14 10:43 /dev/ttyACM0
crw-rw---- 1 root dialout 166, 1 Mar 14 10:49 /dev/ttyACM1
crw-rw---- 1 root dialout 166, 2 Mar 14 10:42 /dev/ttyACM2
crw-rw---- 1 root dialout 166, 3 Mar 14 10:43 /dev/ttyACM3
crw-rw---- 1 root dialout 166, 4 Mar 14 10:43 /dev/ttyACM4

Content of NM original device cgroup, that was created before the /dev/ttyACM* devices were created:

$ cat /sys/fs/cgroup/devices/snap.network-manager.networkmanager/devices.list 
c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 136:* rwm
c 137:* rwm
c 138:* rwm
c 139:* rwm
c 140:* rwm
c 141:* rwm
c 142:* rwm
c 143:* rwm
c 10:63 rwm
c 108:0 rwm
c 0:0 rwm

After re-loading policies:

$ sudo snap run --shell network-manager.networkmanager
root@localhost:/home/test# exit
$ cat /sys/fs/cgroup/devices/snap.network-manager.networkmanager/devices.list 
c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 136:* rwm
c 137:* rwm
c 138:* rwm
c 139:* rwm
c 140:* rwm
c 141:* rwm
c 142:* rwm
c 143:* rwm
c 166:0 rwm
c 166:1 rwm
c 166:2 rwm
c 166:3 rwm
c 166:4 rwm
c 10:63 rwm
c 108:0 rwm
c 0:0 rwm

#4

I was using a non-syncronized with ubuntu core snapd version, which in the end made the udev rule created to re-load the device group call the old name for a binary
(/usr/lib/snapd/snappy-app-dev instead of /lib/udev/snappy-app-dev). So case closed here, thanks to the Core team for helping.