Snapcraft adt failures with the new core release


#1

With the recent autopkgtest machine shortage, I’ve been running Snapcraft’s autopkgtest suite locally using adt’s lxd runner. However, today the tests that actually install and run snaps started erroring with this:

test_plainbox.PlainboxTestCase.test_install_and_execution ... cannot create freezer cgroup hierarchy for snap plainbox-simple: Permission denied
ERROR

I know that string is coming from snap-confine, and I also know that a new core snap was released to stable today, where I know this ran successfully yesterday. I haven’t determined a root cause yet, but here are the host’s logs at the time of one failure:

Nov 15 12:32:02 Pandora kernel: [15454.900544] audit: type=1702 audit(1510777922.074:785): op=linkat ppid=24124 pid=24131 auid=4294967295 uid=166536 gid=166536 euid=166536 suid=166536 fsuid=166536 egid=166536 sgid=166536 fsgid=166536 tty=pts21 ses=4294967295 comm="python3" exe="/usr/bin/python3.5" res=0
Nov 15 12:32:02 Pandora kernel: [15454.900552] audit: type=1302 audit(1510777922.074:786): item=0 name="/usr/bin/dpkg" inode=12642577 dev=08:01 mode=0100755 ouid=165536 ogid=165536 rdev=00:00 nametype=NORMAL
Nov 15 12:32:10 Pandora kernel: [15462.936627] audit: type=1702 audit(1510777930.110:787): op=linkat ppid=24124 pid=24131 auid=4294967295 uid=166536 gid=166536 euid=166536 suid=166536 fsuid=166536 egid=166536 sgid=166536 fsgid=166536 tty=pts21 ses=4294967295 comm="python3" exe="/usr/bin/python3.5" res=0
Nov 15 12:32:10 Pandora kernel: [15462.936635] audit: type=1302 audit(1510777930.110:788): item=0 name="/usr/lib/x86_64-linux-gnu/libdb-5.3.so" inode=12651717 dev=08:01 mode=0100644 ouid=165536 ogid=165536 rdev=00:00 nametype=NORMAL
Nov 15 12:32:38 Pandora kernel: [15490.916142] audit: type=1400 audit(1510777958.090:789): apparmor="STATUS" operation="profile_replace" label="lxd-brave-muskrat_</var/lib/lxd>//&:lxd-brave-muskrat_<var-lib-lxd>://unconfined" name="snap.plainbox-simple.plainbox" pid=28401 comm="apparmor_parser"
Nov 15 12:32:38 Pandora kernel: [15491.386617] audit: type=1400 audit(1510777958.562:790): apparmor="DENIED" operation="capable" namespace="root//lxd-brave-muskrat_<var-lib-lxd>" profile="/snap/core/3440/usr/lib/snapd/snap-confine" pid=28464 comm="snap-confine" capability=2  capname="dac_read_search"

And here is the container log from the time of the failure:

Nov 15 20:32:37 brave-muskrat systemd[1]: Reloading.
Nov 15 20:32:37 brave-muskrat systemd[1]: Reloading.
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/snap-plainbox\x2dsimple-x1.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Mounting Mount unit for plainbox-simple...
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /user.slice: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /init.scope: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/var-lib-lxcfs.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/resolvconf.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-mqueue.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-zero.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-journald.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-meminfo.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-tty.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-fuse.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-urandom.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dbus.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-net-tun.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-null.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/atd.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/cloud-config.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-sys-fs-binfmt_misc.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/system-getty.slice: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/snapd.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-lxd.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-udev-trigger.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/cloud-init-local.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/rc-local.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/sys-fs-fuse-connections.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-random-seed.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-journal-flush.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-ptmx.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-logind.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/snap-core-3440.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-diskstats.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/irqbalance.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/run-snapd-ns.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-random.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-uptime.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/polkitd.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/cron.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/unattended-upgrades.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-.lxd\x2dmounts.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/mdadm.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/networking.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/ssh.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/ufw.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/run-snapd-ns-core.mnt.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/lvm2-monitor.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/snap.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/rsyslog.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/sys-kernel-debug.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/run-user-1000.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-swaps.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/apport.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-update-utmp.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/lxd-containers.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/cloud-final.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/dev-full.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-user-sessions.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/-.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-tmpfiles-setup.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/systemd-udevd.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-stat.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/cloud-init.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/console-getty.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/ondemand.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/accounts-daemon.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/proc-cpuinfo.mount: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Failed to reset devices.list on /system.slice/apparmor.service: Operation not permitted
Nov 15 20:32:37 brave-muskrat systemd[1]: Mounted Mount unit for plainbox-simple.
Nov 15 20:32:38 brave-muskrat snapd[10947]: 2017/11/15 20:32:38.587881 kernel_os.go:192: cannot get boot vars: open /boot/grub/grubenv: no such file or directory
Nov 15 20:32:38 brave-muskrat systemd[1]: Unmounted Mount unit for plainbox-simple.
Nov 15 20:32:39 brave-muskrat systemd[1]: Reloading.
Nov 15 20:32:39 brave-muskrat systemd[1]: Reloading.

I don’t see obvious issues. Any ideas?


2.29 release cycle started
#2

You can duplicate this without having to run the whole suite. Just create a new Xenial LXD container, and run the following commands in it:

$ sudo apt update && sudo apt upgrade -y && sudo apt install squashfuse -y
$ sudo apt build-dep snapcraft
$ sudo apt install python3-pexpect
$ sudo snap install core
$ git clone https://github.com/snapcore/snapcraft
$ cd snapcraft/
$ ./runtests.sh snapd test_plainbox.py

#3

Thanks for this problem report, we will look into this in the (european) morning.


#4

Sounds good, thanks @mvo. I’ll hopefully have more helpful details here by then!


#5

@kyrofa Thank you! @jdstrand might also have an idea about the cause of the denial you are seeing (and he is in a better timezone rgiht now :).


#6

Note that, oddly, I can actually install snaps fine. I installed the core snap with no issues, and if I build the problematic snap above and install it by hand, it actually works. But it fails in the test every single time.

I’m trying to get a simpler reproducible case.


#7

Oh duh. It doesn’t have anything to do with installing the snap, but running it. The test hid that from me a little bit, so sorry.

$ plainbox-simple.plainbox 
cannot create freezer cgroup hierarchy for snap plainbox-simple: Permission denied

#9

Thankfully snaps don’t really update in LXC, so this hasn’t bitten my production deployment yet. Silver lining!


#11

I’ll have a look first thing tomorrow.


#12

Alright here we go, final reproducer. Still no idea what the issue is, but I’ve narrowed it down a lot.

  1. Create a new container:

    $ lxc launch ubuntu:xenial snap-test -e
    
  2. Shell into that container (the rest of the steps are run within that shell):

    $ lxc exec snap-test su - ubuntu
    
  3. Update it and install prereqs:

    $ sudo apt update && sudo apt upgrade -y && sudo apt install squashfuse -y
    
  4. Install the hello-world snap:

    $ sudo snap install hello-world
    
  5. Run it as the unprivileged user:

    $ hello-world
    cannot create freezer cgroup hierarchy for snap hello-world: Permission denied
    
  6. Run as sudo, and realize that doing so unlocks the unprivileged user as well:

    $ sudo hello-world 
    Hello World!
    $ hello-world
    Hello World!
    

So: still no idea what’s happening, but it seems that something privileged needs to happen before starting a snap application. It apparently only needs to happen once. Note that I can now uninstall/reinstall hello-world and it works as expected. However, other snaps need the same privileged operation:

$ sudo snap install hello
hello 2.10 from 'canonical' installed
$ hello.universe 
cannot create freezer cgroup hierarchy for snap hello: Permission denied

#13

Thanks @kyrofa for the excellent instructions to reproduce the issue. It turns out that our lxd test did not run the test snaps as a regular user, just as root. This is why we did not catch this issue. In https://github.com/snapcore/snapd/pull/4230 the test is extended now to ensure we do not hit this ever again.


#14

FWIW, trying the same on artful host.

The log from running hello:

ubuntu@snap-test:~$ SNAPD_DEBUG=1 SNAP_CONFINE_DEBUG=1 hello
2017/11/16 07:59:06.603740 cmd.go:203: DEBUG: restarting into "/snap/core/current/usr/bin/snap"
DEBUG: security tag: snap.hello.hello
DEBUG: executable:   /usr/lib/snapd/snap-exec
DEBUG: confinement:  non-classic
DEBUG: base snap:    core
DEBUG: apparmor label on snap-confine is: /snap/core/3440/usr/lib/snapd/snap-confine
DEBUG: apparmor mode is: enforce
DEBUG: checking if the current process shares mount namespace with the init process
DEBUG: re-associating is not required
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: opening lock file: /run/snapd/lock/.lock
DEBUG: sanity timeout initialized and set for three seconds
DEBUG: acquiring exclusive lock (scope (global))
DEBUG: sanity timeout reset and disabled
DEBUG: ensuring that snap mount directory is shared
DEBUG: unsharing snap namespace directory
DEBUG: creating namespace group directory /run/snapd/ns
DEBUG: namespace group directory does not require intialization
DEBUG: releasing lock (scope: (global))
DEBUG: creating lock directory /run/snapd/lock (if missing)
DEBUG: opening lock directory /run/snapd/lock
DEBUG: opening lock file: /run/snapd/lock/hello.lock
DEBUG: sanity timeout initialized and set for three seconds
DEBUG: acquiring exclusive lock (scope hello)
DEBUG: sanity timeout reset and disabled
DEBUG: initializing mount namespace: hello
DEBUG: opening namespace group directory /run/snapd/ns
DEBUG: attempting to re-associate the mount namespace with the namespace group hello
DEBUG: successfully re-associated the mount namespace with the namespace group hello
DEBUG: releasing resources associated with namespace group hello
cannot create freezer cgroup hierarchy for snap hello: Permission denied

The step that fails is creating snap.hello freezer cgroup. sudo works only until a reboot.


#15

Some updates from IRC:

10:20 < zyga> mvo: I know what the problem is, I think
10:20 < zyga> stgraber: around?
10:23 < zyga> mvo: http://pastebin.ubuntu.com/25973198/
10:24 < zyga> mvo: so two ideas for quick “solution”:
10:24 < zyga> mvo: 1) make that step optional and let it fail, this means mount changes are not atomic in lxd
10:24 < zyga> mvo: 2) talk to stgraber and figure out why lxd sets up containers this way and what we can do about it
10:26 < zyga> mvo, mborzecki: opinions?
10:26 < mborzecki> hmm that’s why g+s works i guess
10:27 < zyga> mborzecki: correct
10:29 < zyga> mvo: it’s actually a deeper problem:
10:29 < zyga> root@my-ubuntu:~# ls -ld /sys/fs/cgroup/devices/
10:29 < zyga> drwxrwxr-x 5 nobody root 0 Nov 16 09:13 /sys/fs/cgroup/devices/
10:29 < zyga> mvo: unless we somehow disabled all udev tagging inside containers
10:30 < zyga> mvo: it will break on when creating the device cgroup
10:30 < zyga> s/on//
10:31 < mborzecki> that’s perhaps silly, but could we g+s snap-confine too? we’re dropping both anyway right after setting up
10:32 < zyga> mborzecki: that’s another option, indeed
10:32 < zyga> though I’d like to understand the motivation behind lxd choices


#16

Maybe want to take a look at this related LP: #1730376 issue discussed on Problems with build-snaps on build.snapcraft.io which hasn’t had a comment from anyone in the snapd team yet.


#17

Hey @sergiusens - I had a quick look but I don’t see how this is related. Can you clarify please.


#18

Only picking on keywords, you mentioned udev; apparently launchpad builders which run on lxd cannot install the core snap due to issues with no udev being installed as a dependency to snapd.


#19

The way LXD sets up the cgroups is sufficient to do cgroup delegation. But cgroup delegation is - as it is on your host system - restricted to the root user of the container. If an unprivileged snap user needs to write to a cgroup then snappy needs to take care to do the cgroup delegation for them. That is, it needs to place the user into a writable cgroup. This is not LXD’s job though. When LXD sets up the container it doesn’t even know what users will exist on your system apart from the root user. This is something that snappy will need to take care of either during install by creating some sort of snappy group or by using our pam module libpam-cgfs which can be used to create writable cgroups on login. The latter would require you to run lxc exec <container-name> -- su -l - ubuntu and then snappy would need to take care to create the cgroup for itself in the current writable cgroup that the unprivileged user is placed in. The latter option however is to some extent distro dependent.


#20

Question about the delegation and being a privileged user or not. Snappy uses a setuid-root executable that writes to /sys/fs/cgroup and then drops back to the user.

What we are being blocked on now, I think, is the ownership of /sys/fs/cgroup (nobody) that differs from the regular permission outside of LXD (root).

Lastly is the root user inside the container an unprivileged user? Should it be able to bypass the nobody ownership?

Snappy manages many cgroups, one per snap, dynamically as the corresponding applications are started. I’m not sure if I understand you correctly and if we can still do that.


#21

Question about the delegation and being a privileged user or not. Snappy uses a setuid-root executable that writes to /sys/fs/cgroup and then drops back to the user.

Well I would expect that to work but I’m not familiar with the internals.

What we are being blocked on now, I think, is the ownership of /sys/fs/cgroup (nobody) that differs from the regular permission outside of LXD (root).

What confuses me about this is why that should matter if you’re using a setuid binary. Giving the container’s root user’s group write access to the cgroup hierarchy is sufficient for it to create it’s own cgroups underneath. Otherwise sudo - setuid too - wouldn’t work as well.

Lastly is the root user inside the container an unprivileged user? Should it be able to bypass the nobody ownership?
The root user inside the container is the root user and therefore privileged wrt to the container. Wrt to the host it is an unprivileged user. But that doesn’t matter since LXD/liblxc will take care to delegate a cgroup to the user on the host.


#22

Thank you for the answer. I will investigate this and see if there’s something we are missing.

BTW: can you tell me (or point me to some docs) about cgroup delegation feature of lxd?