KLONE uses the SLURM job scheduler. When you first ssh into KLONE (e.g.,
klone.hyak.uw.edu) you land on one of the two login nodes (i.e.,
klone2). Login nodes are shared amongst all users to transfer data, navigate the file system, and request resource slices to perform heavy duty computing. You should not use login nodes for heavy compute and automated mechanisms exist to monitor and enforce violations. The tool used is "arbiter2" and you will receive an email for each offending process (Gardner, Migacz, and Haymore 2019).
hyakalloc command [src] you can further see not only which accounts you are able to submit jobs to but also their current utilization. Resource limits are directly proportional to what was contributed by that group.
If you run
sinfo you can see all the partitions available. Each partition represents a class of node from the standard
compute partition to those with high-memory or for different types of GPUs.
There are a few popular types of jobs you could submit:
- interactive where you and test out your workflows live,
- batch which are unattended (you get an email when completed), and
- recurring or "CRON-like" processes that happen on a regular basis.
These are the common and recommended arguments suggested at a minimum to get a job in any form.
If you are using an interactive node to run a parallel application such as Python multiprocessing, MPI, OpenMP, etc. then the number given for the
--ntasks-per-node option must match the number of processes used by your application.
|Account||What lab are you part of? If you run the |
|Partition||What resource partition are you interested in using? This could be anything you see when you run |
|Nodes||How many nodes are these resources spread across? In the overwhelming number of cases this is 1 (for a single node) but more sophisticated multi-node jobs could be run if your code supports it.|
|Cores||How many compute cores do you need? Not all codes can make use of multiple cores and if they do, the performance of the code is not always linear with the resources requested. If in doubt consider contacting the research computing team to assist in this optimization.|
|Memory||How much memory do you need for this job? This is in the format |
|Time||What's the maximum runtime for this job? Common acceptable time formats include |
Resources for interactive jobs are attained either using
salloc. To get resources on a compute node interactively consider the example below.
In this case you are requesting a slice of the standard compute node class that your group
mylab contributed to the cluster. You are asking for 4 compute cores with 10GB of memory for 2 hours and 30 minutes spread across 1 node (single machine). The
salloc command will automatically create an interactive shell session on an allocated node.
Building upon the previous section, if
--nodes is >1 when running
salloc you are automatically placed into a shell of one of the allocated nodes. This shell is NOT part of a SLURM task. To view the names of the remainder of your allocated nodes use
scontrol show hostnames. The
srun command can be used to execute a command on all of the allocated nodes as shown in the example session below.
If your group has an interactive node, use the option
-p <partition_name>-int like below. If you are unsure if your group has an interactive node you can run
hyakalloc and it will appear if you have one.
- If you are not allocated a session with the specified
--memvalue, try smaller memory values
For more details, read the
salloc man page.
When a job scheduled by Slurm begins, it needs to about how it was scheduled, what its working directory is, who submitted the job, the number of nodes and cores allocated to it, etc. This information is passed to Slurm via environment variables. Additionally, these environment variables are also used as default values by programs like
mpirun. To view a node's Slurm environment variables, use
export | grep SLURM.
A comprehensive list of the environment variables Slurm sets for each job can be found at the end of the
sbatch man page.
Below is a slurm script template. Submit a batch job from the
mox login node by calling
If your batch job is using multiple nodes, your program should also know how to use all the nodes (e.g. your program is an MPI program).
The value given for
--nodes should be less than or equal to the total number of nodes owned by your group unless you are running in the
The value given for
--ntasks-per-node should be either
28 for older
mox nodes or
40 for newer
klone nodes if you wish to maximize use of an entire node.
slurmstepd: error: Exceeded job memory limit: your program uses more memory than you allotted during node creation and it has run out of memory. Get a node with more memory and try again.
(ReqNodeNotAvail, UnavailableNodes:n[<node numbers list>]: your node will not expire (and might be running one of your jobs) before the next scheduled maintenance day. Either get a node with a shorter
--timeduration or wait until after the maintenance has been completed.
Unable to allocate resources: Invalid account or account/partition combination specified: you used
-p <group_name> -A <group_name>and you do not belong to that group.
<net_id> as your UW NetID and
<group_name> as your Hyak group partition name, and
<job_id> as an individual job ID:
sinfois used to view information about
moxnodes and partitions. Use
sinfo -p <group_name>to view information about your group's partition or allocation.
squeueis used to view information about jobs located in the scheduling queue. Use
squeue -p <group_name>to view information about your group's nodes. Use
squeue -u <net_id>to view your jobs.
scancelis used to cancel jobs. Use
scancel <job_id>to cancel a job with the given job ID, or use
scancel -u <net_id>to cancel all of your jobs.
sstatdisplays status information of a running job pertaining to CPU, Task, Node, Resident Set Size (RSS), and Virtual Memory (VM) statistics. Read the man page for a comprehensive list of format options.
sacctdisplays information about completed jobs. Read the man page for a comprehensive list of format options.
sreportgenerates reports about job usage and cluster utilization from Slurm accounting (
sacct) data. For example, to get historical usage the group
<group_name>in March 2020, use
sreport cluster UserUtilizationByAccount Start=2020-03-01 End=2020-03-31 Accounts=<group_name>.
All of these man pages can also be viewed on
mox by running