Syllabus
Goals & Rationale
The main objective of this tutorial is to demystify job submission and help researchers efficiently use Hyak's computing resources for their research
Much of the Hyak documentation is organized into bite sized instructional guides for particular software tools or concepts, but these may be too advanced for users who are brand new to High Performance Computing (HPC) and and haven't used a job scheduler before. Here we have prepared a walk-through tutorial of Slurm commands so that you can feel comfortable working independently on Hyak and tailoring tools and scripts to the needs of your research project. The advanced section of this tutorial offers an additional worked example with publicly available data for submitting interactive, single, and array jobs with Slurm (i.e., submitting multiple jobs to be performed in parallel).
On August 14, 2024, we delivered this tutorial live on Zoom. Follow this link to watch the live demonstration.
Our ultimate goal is to prepare you as an independent user of Hyak
A job scheduler is a component or software system responsible for managing and optimizing the allocation of computing resources and tasks within a distributed computing environment. It orchestrates the execution of jobs, tasks, or processes across available resources such as CPUs, memory, and storage.
Slurm: The job scheduler used on Hyak. Slurm stands for Simple Linux Utility (for) Resource Management. See Slurm documentation for detailed help using the job scheduler.
Learning Objectives
- Understand the benefits of parallel computing and scheduling jobs.
- Understand how accounts and partitions determine research computing access, and the purpose and usage of the
hyakalloc
command. - Understand the concept of community idle resources and the checkpoint partitions (
ckpt
,ckpt-g2
,ckpt-all
). - Become familiar with job types and job submission, including requesting GPU jobs.
- Master monitoring the job queue.
Course Content
Tutorial: Slurm
Extra Practice
We have curated a list of Additional Resources and you will find many are relevant for this tutorial.