This request is a follow up to an issue/limitation I encountered which requires us to support a use-case with Classic.
There are common scenarios when an operator wishes to run a Slurm command as another user, oftentimes for accounting purposes. A simple command that will invoke the issue:
slurm.srun --uid 1000 -N1 -l uname -r
The command above tries to run as UID 1000 but cannot because of the confinement mode. As such, it leads me to this request.
Strict confinement is still appropriate, especially for testing and development for Slurm clusters so this request shouldnât replace our existing Snap tracks.
What are representative use cases where the user might specify --uid (eg, especially but not limited to âaccounting purposesâ)? If it is just to run something as non-root, we have system-usernames that would allow someone to run slurm.srun --uid 584788 -N1 -l uname -r.
we have system-usernames that would allow someone to run slurm.srun --uid 584788 -N1 -l uname -r
I tried this and ran into permissions issues because of how Slurm uses system calls to switch user contexts. Let me test it once more and report back with any errors I encounter.
Ran a pair of test cases today and immediately hit a wall. I have the daemons running as snap_daemon but user switching, even to the snap_daemon user, is not permitted. Here are the tests and logs:
root@slurm-test:/tmp# srun --uid 1000 -l uname -a
srun: error: initgroups: Operation not permitted
srun: fatal: Unable to assume uid=1000
For reference, the srun --uid command provides the following explanation of how the command works with users:
âuid =< user >
Attempt to submit and/or run a job as user instead of the invoking user id. The invoking userâs credentials will be used to check access permissions for the target partition. User root may use this option to run jobs as a normal user in a RootOnly partition for example. If run as root, srun will drop its permissions to the uid specified after node allocation is successful. user may be the user name or numerical user ID. This option applies to job and step allocations.
hey @jdstrand, the main reason we need to execute jobs under uid/gid of the user, is because the slurmd process that executes the job needs to run as the effective uid/gid of the active directory user in order to access filesystem resources owned by the active directory user. Also, as @egeeirl mentioned, slurmdbd accounts for the cluster resources used for each job ran by each user. If we canât execute the job under the uid/gid of the user then we have no way to pair up the resource accounting to the user that ran the job.
@egeeirl - Iâm not saying that you should continue to press on making snap_daemon work since based on other comments, it wonât fit the needs for --user, but for posterity, initgroups() uses setgroups() under the hood in a non-sandbox compliant manner (so youâd need to either patch or use the LD_PRELOAD technique). Iâve updated system-usernames to mention this.
IME, slurm, is an orchestration tool for HPC. AIUI, slurm can and does utilize the snapd_daemon for certain actions, but in certain environments the slurmd process itself needs to run (arbitrary) commands as the effective uid/gid user in order access resources for the user (@jamesbeedy referred to this as the âactive directory userâ which seem to tied to slurmâs concept of âpartitionsâ (see the overview url)). Other use cases are to run commands as other users for accounting purposes (AIUI, the current design of slurm is such that it performs resource accounting by tracking the uid that the process ran as as opposed to something like having users login in with an account and executing all commands as the same user and performing tracking via use of the account).
The slurm website states âSlurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clustersâ. So, like flowbot, this is similar to juju. The closest use case in our processes for classic is âmanagement snapsâ which weâve identified as an unsupported use case.
In my limited understanding of slurm, we could perhaps require that slurm be modified to run anything that should be non-root as snap_daemon with slurm then being further modified to perform accounting with this in mind, but IME that would change how users expect to use slurm (ie, theyâd have to âloginâ in some manner). Even if that change was made, slurmâs functionality wrt partitions would be limited.
While orchestration tools like slurm (and juju (and potentially the recent flowbot request) do âmanageâ systems, these systems are not for managing fleets of devices/laptops/servers/etc which have login users and are instead about scale out computing on systems primarily without login users. In that light, I think we should consider the âorchestration snapsâ use case as something different from âmanagement snapsâ.
@jdstrand@pedronis slurm is a resource scheduler for HPC applications. It is closer in comparison to apache spark than to juju. E.g. we use juju to facilitate the slurm lifecycle; we can scale our slurm clusters using juju to meet the resource needs of individual hpc jobs that users may run on slurm.
The hpc space is majorly an enterprise and academic space, in which heightened security and extended red tape are the norm. Active directory realms, access tracking and resource accounting on a per user basis are required with no exceptions.
An example:
A user will have files in a location in an active directory controlled filesystem that the slurmd (compute daemon) process needs to access, on all nodes in the cluster.
When the user executes a slurm job from a central location using srun/sbatch, the slurmd execute under the effective uid of the user that kicked off the job.
The slurmd needs to execute as the effective userid of the active directory user that kicked off the process so that it can access files in the user space on each of the compute nodes.
We have tried many alternatives to support these conventions outside of producing a classic snap.
In conclusion, we have determined that we need to use a classically confined snap to support these use cases.
Slurm isnât managing or even really orchestrating anything. Depending on the use case, it is effectively a middleware. Take this use-case for example:
An organization wants to use StarCCM to simulate some heavy-duty fluid dynamics. In this case, Slurm is simply a tool that StarCCM leverages to complete the task.
Since this is a massively intense computational workload, it would be extremely useful to delegated out to multiple compute nodes. Thatâs where Slurm comes in.
StarCCM, MPI, and Slurm (and other components) work together to slice the workload into computational chunks that can be resolved through N number of compute nodes via Slurm. As such, Slurm isnât necessarily managing a host in a traditional sense. Slurm also isnât really orchestrating or provisioning anything, either.
In the current use case, Juju is used to deploy and provision Slurm across several clusters .
That is a fair distinction and perhaps weâll want to define this as a different use case from orchestration. The main point though is that slurm isnât about managing the compute nodesâ OS, user, etc configuration (ie like puppet or chef might do), it is about putting workloads on them (which in my mind is orchestrating the computation). Semantics aside, âmanagement snapsâ is not a supported use case for classic and Iâm putting forth that slurm is not a âmanagement snapâ but rather something else, which IME is an important distinction when considering slurm for classic confinement.
Semantics aside, âmanagement snapsâ is not a supported use case for classic and Iâm putting forth that slurm is not a âmanagement snapâ but rather something else, which IME is an important distinction when considering slurm for classic confinement.
Sure, that seems fine. So this is something that you Snapcrafters will discuss on your side and get back to me or is there anything else I need to do?
@egeeirl - classic snaps run unrestricted on the systems they are installed on and for this reason we treat reviews of classic snaps differently than other sorts of reviews. For use cases for classic snaps that arenât listed in our current processes, reviewers ask for snapd architect involvement, which I did when I asked for @pedronis to comment. That discussion will happen in this topic and for the moment there is nothing else you need to do, though he may have followup questions for you/other reviewers.