Syllabus

Goals & Rationale#

The main objective of this tutorial is to dimystify job submission and help researchers efficiently use Hyak's computing resources for their research.#

Much of the Hyak documentation is organized into bite sized instructional guides for particular software tools or concepts, but these may be too advanced for users who are brand new to High Performance Computing (HPC) and and haven't used a job scheduler before. Here we have prepared a walk-through tutorial of Slurm commands so that you can feel comfortable working independently on Hyak and tailoring tools and scripts to the needs of your research project. The advanced section of this tutorial offers an additional worked example with publicly available data for submitting interactive, single, and array jobs with Slurm (i.e., submitting multiple jobs to be performed in parallel).

Our ultimate goal is to prepare you as an independent user of Hyak.#

Hyak's Job Scheduler - Slurm

A job scheduler is a component or software system responsible for managing and optimizing the allocation of computing resources and tasks within a distributed computing environment. It orchestrates the execution of jobs, tasks, or processes across available resources such as CPUs, memory, and storage.

Slurm: The job scheduler used on Hyak. Slurm stands for Simple Linux Utility (for) Resource Management. See Slurm documentation for detailed help using the job scheduler.

Learning Objectives#

  • Understand the benefits of parallel computing and scheduling jobs.
  • Understand how accounts and partitions determine research computing access, and the purpose and useage of the hyakalloc command.
  • Understand the concept of community idle resources and the checkpoint partitions (ckpt, ckpt-g2, ckpt-all).
  • Become familiar with job types and job submission, including requesting GPU jobs.
  • Master monitoring the job queue.
Video tutorial available

On August 14, 2024, we delivered this tutorial live on Zoom. Follow this link to watch the live demonstration.

Course Content#

Tutorial: Slurm

Extra Practice#

We have curated a list of Additional Resources and you will find many are relevant for this tutorial.

Acknowledgements

In the advanced sections of this tutorial we use publicly available data and software.

Locator Neural Network is a copyright 2019 of C. J. Battey and released under a Non-Profit Open Software License 3.0 (NPOSL-3.0).

Our adaptation of Populus trichocarpa genotype data and locations are licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license.