Race condition between snapd and NM when providing permissions

Cellular connection with a modem (using NM/MM snaps) fails on boot. There is a strange error from pppd about not being able to connect to the modem port. But there are no apparmor denials. It looks as some sort of race condition:

$ nmcli c add type gsm ifname ttyACM0 con-name pepe apn gprs.pepephone.com
$ sudo journalctl -u snap.network-manager.networkmanager.service --no-pager
...
Mar 12 08:10:54 localhost.localdomain pppd[3508]: pppd 2.4.7 started by root, uid 0
Mar 12 08:10:54 localhost.localdomain pppd[3508]: Failed to open /dev/ttyACM0: Operation not permitted
...

strace revels it is an EPERM error:

[pid 3515] open("/dev/ttyACM0", O_RDWR|O_NONBLOCK) = -1 EPERM (Operation not permitted)
[pid 3515] gettimeofday({1520845102, 49471}, NULL) = 0
[pid 3515] send(3<socket:[25315]>, "<147>Mar 12 08:58:22 pppd[3515]: Failed to open /dev/ttyACM0: Operation not permitted", 85, MSG_NOSIGNAL) = 85
[pid 3515] write(1<socket:[19920]>, "Failed to open /dev/ttyACM0: Operation not permitted", 52) = 52

And, surprisingly, doing something like:

$ sudo snap run --shell network-manager.networkmanager
root@localhost:/home/test# exit

Makes the error vanish, and in next connection re-try pppd is able to open the modem port. It looks like snap-confine loads something in the kernel that was missing when the NM daemon started.

The error also vanishes when:

  1. Restarting the NM daemon
  2. After introducing a delay of a few seconds before the NM daemon starts on boot

Something interesting to keep in mind is that /dev/ttyACM0 takes a few seconds until it appears, as it is created by a separate daemon that sets some GPIO. And NM is probably starting before the device is created.

It turns out that the reason for the EPERM error is device cgroup not being dynamic enough.

$ ls -l /dev/ttyACM*
crw-rw---- 1 root dialout 166, 0 Mar 14 10:43 /dev/ttyACM0
crw-rw---- 1 root dialout 166, 1 Mar 14 10:49 /dev/ttyACM1
crw-rw---- 1 root dialout 166, 2 Mar 14 10:42 /dev/ttyACM2
crw-rw---- 1 root dialout 166, 3 Mar 14 10:43 /dev/ttyACM3
crw-rw---- 1 root dialout 166, 4 Mar 14 10:43 /dev/ttyACM4

Content of NM original device cgroup, that was created before the /dev/ttyACM* devices were created:

$ cat /sys/fs/cgroup/devices/snap.network-manager.networkmanager/devices.list 
c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 136:* rwm
c 137:* rwm
c 138:* rwm
c 139:* rwm
c 140:* rwm
c 141:* rwm
c 142:* rwm
c 143:* rwm
c 10:63 rwm
c 108:0 rwm
c 0:0 rwm

After re-loading policies:

$ sudo snap run --shell network-manager.networkmanager
root@localhost:/home/test# exit
$ cat /sys/fs/cgroup/devices/snap.network-manager.networkmanager/devices.list 
c 1:3 rwm
c 1:7 rwm
c 1:5 rwm
c 1:8 rwm
c 1:9 rwm
c 5:0 rwm
c 5:1 rwm
c 5:2 rwm
c 136:* rwm
c 137:* rwm
c 138:* rwm
c 139:* rwm
c 140:* rwm
c 141:* rwm
c 142:* rwm
c 143:* rwm
c 166:0 rwm
c 166:1 rwm
c 166:2 rwm
c 166:3 rwm
c 166:4 rwm
c 10:63 rwm
c 108:0 rwm
c 0:0 rwm

I was using a non-syncronized with ubuntu core snapd version, which in the end made the udev rule created to re-load the device group call the old name for a binary
(/usr/lib/snapd/snappy-app-dev instead of /lib/udev/snappy-app-dev). So case closed here, thanks to the Core team for helping.