Slurm Snap Classic Confinement

jamesbeedy · May 2, 2020, 11:13pm

Hello,

We are trying to release our slurm snap to the snap store as a classically confined snap. Both slurm and its supporting components use the setsid command in some way or another (which we have deemed isn’t supported in any way by strict confinement.) We are trying to move forward serving our snap from the snap store, and this is currently a blocker for us. Can we either have a conversation around this, and/or take some action to do what we need to do to get this snap approved for release?

Thanks!

alexmurray · May 4, 2020, 1:35am

Ah at first I was confused since I thought you were referring about access to the setsid system call - which is already allowed by default for all snaps as it is in the default policy: https://github.com/snapcore/snapd/blob/2.44.5/interfaces/seccomp/template.go#L407

However if you require an external command like /usr/bin/setsid you can simply bundle this inside your snap using the stage-packages directive (and in this case specifying the util-linux package as this provides /usr/bin/setsid) - since strictly confined snaps cannot execute binaries from the host machine.

jamesbeedy · May 6, 2020, 1:04am

This is exactly what I needed to do. Thanks @alexmurray

jamesbeedy · May 6, 2020, 5:44pm

@alexmurray we have followed up on this and it got us a bit further, but now we have an issue where the code we are snapping calls setegid and throws a chain of errors for us.

Trying to execute the srun command in our snap I get the following:

$ slurm.srun -pdebug -n1 -l hostname  -vvvvvv
srun: error: task 0 launch failed: Slurmd could not set UID or GID

Looking in the slurm logs I see:

[2020-05-06T03:55:39.809] [2.0] error: setegid: Operation not permitted

Snappy debug shows:

= Seccomp =
Time: May  6 03:33:30
Log: auid=4294967295 uid=0 gid=0 ses=4294967295 pid=4452 comm="slurmstepd" exe="/snap/slurm/x1/sbin/slurmstepd" sig=0 arch=c000003e 119(setresgid) compat=0 ip=0x7fd2d6d5bd9d code=0x50000
Syscall: setresgid
Suggestion:
* adjust program to not use 'setresgid' until per-snap user/groups are supported (https://launchpad.net/bugs/1446748)

Looking further into the same mgr.c I see this which leads me to believe that executing as the snap_daemon user might be a way around this. Am I on the right track here in thinking that running the process as the snap_daemon user could be a path forward?

Do you have advice on how might be able to get through this?

Thank you!

jdstrand · May 6, 2020, 7:23pm

FYI, you may either use system-usernames to drop to the snap_daemon user, patch to not drop or LD_PRELOAD to make it a no-op.

jamesbeedy · May 6, 2020, 10:34pm

@jdstrand Thanks for the feedback. If you don’t mind, how do I exec my commands and daemons as the snap_daemon user? Will simply setting

system-usernames:
  snap_daemon: shared

Force my processes to execute as snap_daemon?

egeeirl · May 7, 2020, 12:51am

As a follow-up, this is how we are currently implementing it - https://github.com/omnivector-solutions/snap-slurm/blob/strict_testing/snap/snapcraft.yaml#L14

However, this does not appear to provide us with the desired outcome; our daemons are still always running and executing as root.

alexmurray · May 7, 2020, 3:52am

By specifying

system-usernames:
  snap_daemon: shared

This allows the snap to drop privileges to the snap_daemon user/group but it does not actually force that - so your application will still be run as a root daemon but it is now allowed to transition to the snap_daemon user - so the app, or perhaps some wrapper script, would still need to setgroups()/setgid()/setuid() etc to drop privileges from root to snap_daemon - see https://snapcraft.io/docs/system-usernames for more info and some discussion about securely dropping privileges.

As a simple example I found @sergiusens created https://github.com/sergiusens/user-daemon which might be useful to look at (although ignore the comment about requiring snapd from edge since snap_daemon has been supported since snapd 2.41 which is stable)

alexmurray · May 12, 2020, 7:26am

As per SLURM auto-connect for network-control [Was: SLURM Snap (transfer ownership)] this appears to have changed to a request for auto-connect of network-control

jamesbeedy · June 19, 2020, 10:01pm

Request for classic confinement continued here: Request for Classic confinement: Slurm