brauner
November 16, 2017, 2:59pm
23
Thank you for the answer. I will investigate this and see if there’s something we are missing.
No problem, I’m going to take a look at the cgroup portion of your code.
BTW: can you tell me (or point me to some docs) about cgroup delegation feature of lxd?
Cgroup delegation in liblxc (that’s what does it for LXD) for cgroup v1 hierarchies is:
give write access to the revelant cgroup hierarchy created by liblxc for the container by:
chow()ning the cgroup directory gid to the container’s root user’s id
chow()ning the cgroup.procs
file’s gid to the container’s root user’s id
chow()ning the cgroup.tasks
file’s gid to the container’s root user’s id
The model is different for cgroup v2 but that is out of scope now anyway.
Thank you!
Note that the cgroup code is in two places:
the device cgroup code is in cmd/snap-confine/udev-support.c
in the snapd tree
the freezer cgroup code is in cmd/snap-update-ns/freezer.go
brauner
November 16, 2017, 3:04pm
25
I see what you’re doing is:
// Open the hierarchy directory for the given snap. int hierarchy_fd SC_CLEANUP(sc_cleanup_close) = -1; hierarchy_fd = openat(cgroup_fd, buf, O_PATH | O_DIRECTORY | O_NOFOLLOW | O_CLOEXEC); if (hierarchy_fd < 0) { die("cannot open freezer cgroup hierarchy for snap %s", snap_name); } // Since we may be running from a setuid but not setgid executable, ensure // that the group and owner of the hierarchy directory is root.root. if (fchownat(hierarchy_fd, "", 0, 0, AT_EMPTY_PATH) < 0) { die("cannot change owner of freezer cgroup hierarchy for snap %s to root.root", snap_name); } // Open the tasks file. int tasks_fd SC_CLEANUP(sc_cleanup_close) = -1; tasks_fd = openat(hierarchy_fd, "tasks", O_WRONLY | O_NOFOLLOW | O_CLOEXEC); if (tasks_fd < 0) { die("cannot open tasks file for freezer cgroup hierarchy for snap %s", snap_name); } // Write the process (task) number to the tasks file. Linux task IDs are
if (fchownat(hierarchy_fd, "", 0, 0, AT_EMPTY_PATH) < 0) {
die("cannot change owner of freezer cgroup hierarchy for snap %s to root.root", snap_name);
}
I don’t understand why you want to chown the /sys/fs/freezer
cgroup itself. I think this is where you fail. That shouldn’t be needed for you to create writable cgroups. It’s sufficient if you can chown it to the relevant gid.
That code is AFAIK not used as freezer control moved to snap-update-ns
. To answer your question though: We chown directories that we created to ensure that we don’t leak the group of the user that initially ran the command that triggered us to create the cgroup. Otherwise those would be root:zyga
, for example. We never chown things we didn’t create so we should not change /sys/fs/cgroup/freezer
itself.
EDIT: I’m sorry for a rush response: We do use freezer cgroup from both sides. The C side just moves the process there. The go side handles the actual freezing.
brauner
November 16, 2017, 3:11pm
27
Np, then this is the crucial step I think. What you want to do is to only check whether you can write to the cgroup. You shouldn’t need to chown the freezer cgroup itself I’d say. Just check if you can write to it and if you can use it if not don’t use it (If that’s possible for you.).
brauner
November 16, 2017, 3:13pm
28
I’m obviously blatantly ignorant about some of your requirements. So my advice obviously needs to be checked against yours.
I’m not sure I understand. Are you referring to chowning of /sys/fs/cgroup/freezer/snap.example
or /sys/fs/cgroup/freezer
brauner
November 16, 2017, 3:16pm
30
Yes it does. Though it also runs with an apparmor profile (look for snap-conifne.apparmor.in
)
brauner
November 16, 2017, 3:28pm
32
Right, his is the portion that runs under non-classic confinement in:
// 777 permissions for /var/lib and we need to fixup
// for systems that had their NS created with an
// old version
sc_maybe_fixup_permissions();
sc_maybe_fixup_udev();
// Associate each snap process with a dedicated snap freezer
// control group. This simplifies testing if any processes
// belonging to a given snap are still alive.
// See the documentation of the function for details.
sc_cgroup_freezer_join(snap_name, getpid());
sc_unlock(snap_name, snap_lock_fd);
// Reset path as we cannot rely on the path from the host OS to
// make sense. The classic distribution may use any PATH that makes
// sense but we cannot assume it makes sense for the core snap
// layout. Note that the /usr/local directories are explicitly
// left out as they are not part of the core snap.
debug
("resetting PATH to values in sync with core snap");
setenv("PATH",
I assume. That’s likely your AppArmor profile getting in the way. Otherwise if that really runs as setuid() root this must work.
I just created a xenial container and updated snapd inside. With squashfuse
installed and the core
snap installed I tried running a simple busybox snap:
ubuntu@xenial:/root$ /snap/bin/snapd-hacker-toolbelt.busybox
cannot create freezer cgroup hierarchy for snap snapd-hacker-toolbelt: Permission denied
This error is from the mkdirat
call, not from fchown
.
As for apparmor:
ubuntu@xenial:/root$ dmesg | grep DENIED
[ 9775.971367] audit: type=1400 audit(1510844464.690:88): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-trusty_<var-lib-lxd>" profile="/sbin/dhclient" name="/dev/pts/4" pid=11638 comm="dhclient" requested_mask="wr" denied_mask="wr" fsuid=165536 ouid=165536
[ 9775.971544] audit: type=1400 audit(1510844464.690:89): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-trusty_<var-lib-lxd>" profile="/sbin/dhclient" name="/dev/pts/4" pid=11638 comm="dhclient" requested_mask="wr" denied_mask="wr" fsuid=165536 ouid=165536
[ 9871.850641] audit: type=1400 audit(1510844560.570:94): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-artful_</var/lib/lxd>" name="/sys/fs/cgroup/unified/" pid=13442 comm="systemd" fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
[ 9873.026539] audit: type=1400 audit(1510844561.745:95): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-artful_</var/lib/lxd>" name="/var/lib/lxcfs/" pid=13721 comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
[ 9878.090184] audit: type=1400 audit(1510844566.809:96): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090193] audit: type=1400 audit(1510844566.809:97): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090196] audit: type=1400 audit(1510844566.809:98): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090198] audit: type=1400 audit(1510844566.809:99): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[11221.481436] audit: type=1400 audit(1510845910.202:117): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-xenial_<var-lib-lxd>" profile="/snap/core/3440/usr/lib/snapd/snap-confine//snap_update_ns" name="/dev/null" pid=17805 comm="5" requested_mask="r" denied_mask="r" fsuid=165536 ouid=0
I see a number of failures here: The file_inherit
of /dev/null
is one interesting aspect. Should we adjust the profile for LXD / snapd somehow?
EDIT: Actually they are all interesting. What do you think?
brauner
November 16, 2017, 3:39pm
34
Yeah, I pointed this out above.
Before you are trying to mkdirat()
you seem to be changing into an AppArmor profile “mount-namespace-capture-helper”. Maybe interesting to see what this does.
brauner
November 16, 2017, 3:44pm
35
ubuntu@xenial:/root$ dmesg | grep DENIED
[ 9775.971367] audit: type=1400 audit(1510844464.690:88): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-trusty_<var-lib-lxd>" profile="/sbin/dhclient" name="/dev/pts/4" pid=11638 comm="dhclient" requested_mask="wr" denied_mask="wr" fsuid=165536 ouid=165536
[ 9775.971544] audit: type=1400 audit(1510844464.690:89): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-trusty_<var-lib-lxd>" profile="/sbin/dhclient" name="/dev/pts/4" pid=11638 comm="dhclient" requested_mask="wr" denied_mask="wr" fsuid=165536 ouid=165536
[ 9871.850641] audit: type=1400 audit(1510844560.570:94): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-artful_</var/lib/lxd>" name="/sys/fs/cgroup/unified/" pid=13442 comm="systemd" fstype="cgroup2" srcname="cgroup" flags="rw, nosuid, nodev, noexec"
[ 9873.026539] audit: type=1400 audit(1510844561.745:95): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-artful_</var/lib/lxd>" name="/var/lib/lxcfs/" pid=13721 comm="(networkd)" flags="ro, nosuid, nodev, remount, bind"
[ 9878.090184] audit: type=1400 audit(1510844566.809:96): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090193] audit: type=1400 audit(1510844566.809:97): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090196] audit: type=1400 audit(1510844566.809:98): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[ 9878.090198] audit: type=1400 audit(1510844566.809:99): apparmor="DENIED" operation="file_lock" profile="lxd-artful_</var/lib/lxd>" pid=13962 comm="(ostnamed)" family="unix" sock_type="dgram" protocol=0 addr=none
[11221.481436] audit: type=1400 audit(1510845910.202:117): apparmor="DENIED" operation="file_inherit" namespace="root//lxd-xenial_<var-lib-lxd>" profile="/snap/core/3440/usr/lib/snapd/snap-confine//snap_update_ns" name="/dev/null" pid=17805 comm="5" requested_mask="r" denied_mask="r" fsuid=165536 ouid=0
I see a number of failures here: The file_inherit of /dev/null is one interesting aspect. Should we adjust the profile for LXD / snapd somehow?
I’m not sure why you’d want to inherit an fd for /dev/null
or /dev/pts/<idx>
which must be the fd lxd currently uses from the host for its exec session. The AppArmor denies seem reasonable to me. (Apart from the cgroup2 deny but that’s probably apparmor not knowing about cgroup2.
On second thought the helper that is doing the mount namespace capture (bind mount) is probably a red herring. I’ll test a tweak to see what happens.
What about the various mount denials?
I turned of kernel rate limiting by running sysctl kernel.printk_ratelimit=0
and patched the profile to allow access to /dev/null
but I don’t see any denials and the error is exactly the same as before.
brauner
November 16, 2017, 3:55pm
38
I turned of kernel rate limiting by running sysctl kernel.printk_ratelimit=0 and patched the profile to allow access to /dev/null but I don’t see any denials and the error is exactly the same as before.
That is orthogonal to what we are discussing here I think.
snap-confine
itself seems to be running under an AppArmor profile as indicated by:
if (!sc_is_hook_security_tag(security_tag)) {
struct sc_error *err SC_CLEANUP(sc_cleanup_error) = NULL;
snap_context = sc_cookie_get_from_snapd(snap_name, &err);
if (err != NULL) {
error("%s\n", sc_error_msg(err));
}
}
struct sc_apparmor apparmor;
sc_init_apparmor_support(&apparmor);
if (!apparmor.is_confined && apparmor.mode != SC_AA_NOT_APPLICABLE
&& getuid() != 0 && geteuid() == 0) {
// Refuse to run when this process is running unconfined on a system
// that supports AppArmor when the effective uid is root and the real
// id is non-root. This protects against, for example, unprivileged
// users trying to leverage the snap-confine in the core snap to
// escalate privileges.
die("snap-confine has elevated permissions and is not confined"
" but should be. Refusing to continue to avoid"
" permission escalation attacks");
}
Where is this profile?
brauner
November 16, 2017, 3:56pm
39
Oh ok, I see it just checks whether it runs confined it seems.
The profile is in snap-confine.apparmor.in
in the source tree or in /etc/apparmor.d/*snap-confine.real
NOTE: There will be two files because snapd re-executes itself so one will be for the packaged version and one will be for the snapd-from-core-snap.
brauner
November 16, 2017, 4:10pm
41
zyga-snapd:
snap-confine.apparmor.in
Your AppArmor profile seems to be allowing to read the freezer cgroup but not write to it, am I right?
capability sys_admin,
capability dac_override,
/sys/fs/cgroup/devices/snap{,py}.*/ w,
/sys/fs/cgroup/devices/snap{,py}.*/tasks w,
/sys/fs/cgroup/devices/snap{,py}.*/devices.{allow,deny} w,
# cgroup: freezer
# Allow creating per-snap cgroup freezers and adding snap command (task)
# invocations to the freezer. This allows for reliably enumerating all
# running tasks for the snap.
/sys/fs/cgroup/freezer/ r,
/sys/fs/cgroup/freezer/snap.*/ w,
/sys/fs/cgroup/freezer/snap.*/tasks w,
# querying udev
/etc/udev/udev.conf r,
/sys/**/uevent r,
/lib/udev/snappy-app-dev ixr, # drop
/run/udev/** rw,
/{,usr/}bin/tr ixr,
/usr/lib/locale/** r,
Hmm maybe I read my apparmor wrong but this should say we can open the freezer
directory and write and create the snap.*
sub-directory there. If this was failing it would fail outside of LXD as well.