6 posts tagged with "gpu"

View All Tags

December 2024 Maintenance Details

Kristen Finch

Kristen Finch

HPC Staff Scientist

Hello Hyak Community,

Our December maintenance is complete. Thank you for your patience. The next maintenance will be Tuesday January 14, 2025.

Notable Updates#

  • Updated node images, ensuring the security and behavior you expect from Hyak klone.
  • We migrated LMOD modules and other tools and software stored in /mmfs1/sw/ to Solid State Drives (SSDs). Users may notice increased speed of modules.

Office Hours over Winter Break#

I will continue to hold Zoom office hours on Wednesdays at 2pm throughout December. Attendees need only register once and can attend any of the occurrences with the Zoom link that will arrive via email. Click here to Register for Zoom Office Hours.

December 12 will be the last in-person office hours for Fall term to be held at 2pm at the eScience Institute (address: WRF Data Science Studio, UW Physics/Astronomy Tower, 6th Floor, 3910 15th Ave NE, Seattle, WA 98195). In-person office hours will resume on Thursdays at 2pm starting January 9, 2025.

If you would like to request 1 on 1 help, please send an email to help@uw.edu with "Hyak Office Hour" in the subject line to coordinate a meeting.

Research Computing Club Officer Nominations#

The Research Computing Club (RCC) at UW is looking for nominations for Officer positions. The RCC provides essential computational resources to support students working on a wide range of research projects using high-performance computing through UW’s Hyak system and cloud platforms like AWS, Microsoft Azure, and Google Cloud. The RCC relies on student officers to continue providing these resources to the UW community and to organize community-driven events such as Hackathons and trainings. Officer positions are:

  • President
  • Director of AI/ML
  • Director of Outreach
  • Director of Hyak
  • Director of Cloud Computing

Please consider nominating yourself if you are interested, or nominating someone you know who is interested, by filling out the form linked HERE.

External Opportunities#

DOE Computational Science Graduate Fellowship- Applications are being accepted through January 16, 2025 for the Department of Energy Computational Science Graduate Fellowship (DOE CSGF). Candidates must be U.S. citizens or lawful permanent residents who plan full-time, uninterrupted study toward a Ph.D. at an accredited U.S. university.

The DOE CSGF is composed of two computational science tracks. Eligible fellowship candidates should carefully review the criteria for both tracks prior to initiating the application process:

  • The Science & Engineering Track accepts doctoral students engaged in computational science research with a science or engineering focus.
  • The Mathematics/Computer Science Track accepts students pursuing research in broadly applicable methods and technology for high-performance computing (HPC) systems.

Students applying to the Mathematics/Computer Science Track must be pursuing a doctoral degree in applied mathematics, statistics, computer science, computer engineering or computational science — in one of these departments or their academic equivalent. A departmental exception is made for students whose research is focused on algorithms or software for quantum information systems and who are enrolled in a science or engineering field. In all cases, research must contribute to more effective use of emerging HPC systems.

DOE NNSA Stewardship Science Graduate Fellowship- Applications are being accepted through January 14, 2025 for the Department of Energy National Nuclear Security Administration Stewardship Science Graduate Fellowship (DOE NNSA SSGF). Candidates must be U.S. citizens who plan full-time, uninterrupted study toward a Ph.D. at an accredited U.S. university.

The DOE NNSA SSGF provides doctoral students with in-depth training in areas relevant to stewardship of the nation's nuclear stockpile: high energy density physics, nuclear science, or materials under extreme conditions. Senior undergraduates and first- or second-year doctoral students are eligible to apply.

DOE NNSA Laboratory Residency Graduate Fellowship- As part of its science and national security missions, the U.S. Department of Energy National Nuclear Security Administration (DOE NNSA) supports a spectrum of basic and applied research in science and engineering at the agency's national laboratories, at universities and in industry.

Because of its continuing needs, the NNSA seeks candidates who demonstrate the skills and potential to form the next generation of leaders in the following fields via the DOE NNSA LRGF program:

  • Engineering and Applied Sciences
  • Physics
  • Materials
  • Mathematics and Computational Science To meet its primary objective of encouraging the training of scientists, the DOE NNSA LRGF program provides financial support to talented students who accompany their study and research with practical work experience at one or more of the following DOE NNSA facilities: Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Sandia National Laboratories or the Nevada National Security Site.

NSF ACCESS Student Training and Engagement Program (STEP) - Applications due February 15.

  • STEP 1: 2-week introductory experience - Join a dynamic two-week experience in Miami, FL in May! Travel, accommodations, and stipend included. This program is designed to help you:

    • Become more aware of the many areas of interest within the field: web programming, high performance computing, cybersecurity, networking, real use of AI.
    • Determine where your interests are.
    • Discover what areas of study you should pursue in the future.
    • Experience high levels of direct interaction with diverse people with diverse skill sets.
  • STEP-2: Full-time for the summer (in-person + virtual) - follows STEP-1 Travel and accommodations covered for in-person events as well as a stipend for participating plus includes travel to a national conference. This program is designed to help you:

    • Develop an intermediate understanding of one of the areas of interest within the field: web programming, high performance computing, cybersecurity, networking, real use of AI.
    • Provide opportunities for interactions with professionals in the field.
  • STEP-3: Part-time during the school year, follows STEP-2. Continue interacting (virtually) with your team on projects on a part-time basis during the academic year. Participants will receive a stipend and travel to the annual Supercomputing conference (SC25 in St. Louis, MO, USA). This program is designed to help you:

    • Develop a deep understanding of one of the areas of interest within the field: web programming, high performance computing, cybersecurity, networking, real use of AI.
    • Provide further opportunities for interactions with professionals in the field.

Understand how a career in cyberinfrastructure could fit with your future plans!

External Trainings#

Looking for FREE training on cutting-edge digital technologies for industry? Find out more about the Training from the Hartree Centre HERE!

Whether you’re looking to get to grips with the basics or searching for new tools and techniques to apply, Hartree Training supports both self-directed online learning as well as face-to-face practical sessions with badge certification available for you to share your new skills with your network. Currently offering courses covering a range of advanced digital technologies including:

  • Data Science
  • Artificial Intelligence and Modelling
  • High Performance and Exascale Computing
  • Software Engineering
  • Emerging Technologies

To enroll, create a FREE account or log in. Once you have created an account, you can log in and sign up for as many of our courses as you like.

NVIDIA Professional Workshops and Certificates Blog post HERE discusses NVIDIA Professional Workshops ($3000 each) and Certificate Exams ($400 each) in AI Infrastructure and AI Operations. Includes links to workshops and certificate programs in Generative AI LLMs and Generative AI Multimodal.

Happy Computing,

Hyak Team

November 2024 Maintenance Details

Kristen Finch

Kristen Finch

HPC Staff Scientist

Hello Hyak Community,

Our November maintenance is complete. Thank you for your patience while we make package updates to node images, ensuring the security and behavior you expect from Hyak klone.

The next maintenance will be Tuesday December 10, 2024.

Notable Updates#

  • We updated the GPU drivers to version 565.57.01 to patch security vulnerabilities CVE-2024-0117 through CVE-2024-0121 and CVE-2024-0126.
  • We migrated user Home directories to Solid State Drives (SSDs). Users may notice increased speed of software stored in Home directories.

Upcoming Training#

Hyak: Scheduling jobs with Slurm workshop is on Thursday, November 14th, from 10 a.m. to 12 p.m. in the WRF Data Science Studio (UW Physics/Astronomy Tower, 6th Floor; 3910 15th Ave NE, Seattle, WA 98195), and will cover Hyak’s job scheduler Slurm, interactive jobs, batch jobs, and array jobs. REGISTER HERE!

Hyak: Introduction to Deep Learning workshop is on Tuesday, December 3rd, from 10 to 11:50 a.m. in CSE2 (Gates Center) Room 371. Participants will start with a computer vision example, training a model on a sample dataset, and then learn how to execute the training process on Hyak. REGISTER HERE!

Office Hours#

Zoom office hours will be held on Wednesdays at 2pm. Attendees need only register once and can attend any of the occurrences with the Zoom link that will arrive via email. Click here to Register for Zoom Office Hours

In-person office hours will be held on Thursdays at 2pm at the eScience Institute (address: WRF Data Science Studio, UW Physics/Astronomy Tower, 6th Floor, 3910 15th Ave NE, Seattle, WA 98195).

The Research Computing Club will be holding office hours fall term. In-person office hours will be held at the eScience Institute, WRF Data Science Studio.

OfficerDateTime
Sam Shin19 Nov2pm
Teerth Mehta3 Dec2pm

If you would like to request 1 on 1 help, please send a ticket to help@uw.edu with "Hyak Office Hour" in the subject line to coordinate a meeting.

April 2024 Maintenance Details

Kristen Finch

Kristen Finch

HPC Staff Scientist

Hello Hyak Community,

Thank you for your patience this month while there was more scheduled downtime than usual to allow for electrical reconfiguration work in the UW Tower data center. We appreciate how disruptive this work has been in recent weeks. Please keep in mind that this work by the data center team has been critical in allowing the facility to increase available power to the cluster to provide future growth capacity, which was limiting deployment of new equipment in recent months.

The Hyak team was able to use the interruption to implement the following changes:

  • Increase in checkpoint (--partition=ckpt) runtime for GPU jobs from 4-5 hours to 8-9 hours (pre-emption for requeuing will still occur subject to cluster utilization). Please see the updated documentation page for information about using idle resources.
  • The NVIDIA driver has been updated for all GPUs.

Our next scheduled maintenance will be Tuesday May 14, 2024.

Training Opportunities#

Follow NSF ACCESS Training and Events posting HERE to find online webinars about containers, parallel computing, using GPUs, and more from HPC providers around the USA.

Questions? If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak in the subject line.

Fairshare improvements on klone

Nam Pho

Nam Pho

Director for Research Computing
note

We have adjusted legacy fairshare-related settings to account for GPUs and large memory contributions and usage in order to help more fairly allocate checkpoint resources.

History#

In fall 2019 (almost two years ago to the day) the Hyak team received our first Turing generation GPU node. Hyak has had a modest GPU footprint in the past as far back as a decade ago with the first generation cluster (called "IKT") and its pre-Pascal generation cards. In 2015 we acquired a smaller test bed of Pascal generation GPUs for the second generation cluster (called "MOX"). There were never more than a dozen GPUs in either the IKT or MOX clusters, but the introduction of Turing GPUs marked a resurgence of interest in these accelerators among the UW research community. In the last two years, we've substantially expanded our capabilities to over 300 GPUs.

Background#

Hyak clusters work on a "condo" model: labs are able to utilize their contributed hardware on-demand as well as take advantage of idle capacity from other groups' hardware via the checkpoint (ckpt) partition. Your checkpoint priority — or "fairshare" in Slurm scheduler parlance — is weighted such that your fairshare is directly proportional to your lab’s contribution to the cluster. In the MOX days, GPU users tended to stay within their contributed hardware partitions and rarely made use of checkpoint. We attributed this to a mental shift: students were used to using a single resource, like a desktop computer, rather than a shared cluster of computing resources. However, with the migration to the third generation Hyak cluster (called "klone") and its new QoS scheduling system and the increasing comfort of students using a shared platform, GPU utilization in the checkpoint partition has increased as well. This is a good thing: we want groups to benefit from their Hyak membership in the cluster and take advantage of idle cluster resources beyond their initial hardware contributions. This is a primary tenet of our social contract with the Hyak community: as a node contributor to the cluster, you have access to idle resources of the whole cluster.

Problem#

Fairshare was simpler to calculate in the pre-GPU days because our infrastructure was homogenous: one node contributed to the cluster equaled one fairshare unit. During the last two years of exponential GPU adoption on Hyak, the fairshare calculation has not evolved: 1 HPC node was the same as 1 GPU node at 1 fairshare unit. This didn’t hold because a GPU node can cost between 4 to 8 times (or more) than a traditional HPC node. The result was that labs with GPU or other speciality (e.g., high-memory) nodes tended to have smaller fairshares compared to groups with the same dollar investment but only in traditional CPU nodes. In practice, this meant these GPU users often directly competed for resources with non-GPU jobs in the checkpoint partition on a non-level playing field.

Solution#

Taking into consideration all of this information, as well as the fact that you can request as little as 1 GPU or 1 CPU from the scheduler, we have adjusted the fairshare calculations as follows:

  • Financially: 1 GPU card is roughly equivalent to 40 CPU cores (on a dollar basis), therefore the cost normalization is 40:1 in favor of GPUs.
  • Scarcity: 1 server typically holds 8 GPU cards or 40 CPU cores, therefore the scarcity normalization is 5:1 in favor of GPUs.
  • Combining the financial and scarcity considerations in the points above, the final weighting is 200:1 in favor of GPUs. In other words, 1 GPU card is worth 200 times more than a single CPU core in the eyes of the scheduler and factored into your checkpoint fairshare. Please note that this example only applies to the higher GPU memory cards (i.e., gpu-rtx6k) while less expensive GPUs have commensurately less weight.

Summary#

With the October monthly maintenance today we have introduced a new fairshare weighting system on the klone cluster's checkpoint (ckpt) partition that commensurately acknowledges GPU labs for their contributions to the Hyak community. This has no impact on jobs submitted to non-ckpt partitions.

Pytorch and CUDA11

Nam Pho

Nam Pho

Director for Research Computing
info

During the January 12, 2021 mox maintenance period long overdue package updates will be applied. The most user impactful upgrade is the GPU driver from to 418.40.04 to 460.27.04 that will allow for CUDA11 support (up from CUDA10).

The single biggest research use for GPUs on Hyak is for machine learning and artificial intelligence and the community has been clammoring for CUDA11 support for some time. Unfortunately, it's not easy to separate the GPU driver from the node images so it had to wait until the next maintenance window and some testing for non-ML GPU workflows on Hyak like our gromacs users in the molecular dynamics community.

tl;dr your existing Pytorch codes should work but if you wanted to use the new features in Pytorch that required CUDA11 you can upgrade Pytorch and it will work.

Installing Pytorch with CUDA11#

Since this is now the latest and greatest on Hyak I've taken the opportunity to update the Python documentation on how to install Pytorch with CUDA11 support within a miniconda3 environment, check out the step-by-step here.

Reverse compatibility with CUDA10#

Before the January 12, 2021 cluster maintenance every GPU on Hyak had a driver with CUDA10 and all of your codes were previously compiled against it. To test that the GPU driver update to CUDA11 wouldn't impact the most popular machine learning libraries we are compiling Pytorch against our pre-maintenance CUDA10 and testing it against a GPU with the newer CUDA11 installed.

conda create -p /gscratch/scrubbed/npho/pytorch-cuda10 python=3.8 -y

Activate your new pytorch-cuda10 environment:

conda activate pytorch-cuda10

The Pytorch website [www] has a nice getting started matrix that generates the requisite install commands against CUDA10.

pytorch-cuda10

The command shown above to copy-and-paste below:

pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

Now we can load the Python interpreter and confirm Pytorch is installed and the CUDA10 compiled library recognizes this GPU with CUDA11 [www].

(pytorch-cuda10) $ python3 Python 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.7.1+cu101'
>>> torch.cuda.is_available()
True
>>>

Success!

Previously compiled libraries against CUDA10 from pre-January 12, 2021 maintenance times should still work on the GPUs now with CUDA11. However, if you want to use the full features of libraries that take advantage of newer capabilities in CUDA11 then you should definitely upgrade your libraries.

gromacs on GPUs

Nam Pho

Nam Pho

Director for Research Computing
info

During the January 12, 2021 mox maintenance period long overdue package updates will be applied. The most user impactful upgrade is the GPU driver from to 418.40.04 to 460.27.04 that will allow for CUDA 11 support (up from CUDA 10).

The second most widely used GPU-enabled workflow on HYAK (besides machine learning) is molecular dynamics (MD) so we wanted to test one of the most popular MD codes, gromacs [source], and ensure this driver upgrade wouldn't negatively impact our researchers. I couldn't find gromacs compiled with GPU support currently in our module collection so I used it as an opportunity to create one for you all, read on!

warning

This is an excercise to demonstrate the support for molecular dynamics on GPUs as a proof-of-concept. Scientific verification of the software compile options (e.g., single-precision) and its results is the responsibility of the researcher.

Using gromacs#

I'll start with the end result for those of you who just want to use it but following that I'll dive into the nuts and bolts of how we created the module so you can perform additional optimizations.

This is a GPU-enabled version of gromacs so we need a GPU first (can verify with nvidia-smi).

salloc -A uwit -p ckpt --time=4:00:00 -n 4 --mem=20G --gpus=1

gromacs-2020.4 module#

Once we have a GPU we use modules to load gromacs-2020.4 and all its required dependencies (e.g., CUDA11).

module load gromacs/2020.4-cuda11.1

All packages are sub-commands of the gmx binary so you can verify the module.

$ gmx -version
:-) GROMACS - gmx, 2020.4 (-:
GROMACS version: 2020.4
Verified release checksum is 79c2857291b034542c26e90512b92fd4b184a1c9d6fa59c55f2e24ccf14e7281
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: CUDA
SIMD instructions: AVX_512
FFT library: fftw-3.3.3-sse2
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.8
Tracing support: disabled
C compiler: /sw/gcc/10.1.0/bin/gcc GNU 10.1.0
C compiler flags: -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /sw/gcc/10.1.0/bin/g++ GNU 10.1.0
C++ compiler flags: -mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /sw/cuda/11.1.1-1/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Mon_Oct_12_20:09:46_PDT_2020;Cuda compilation tools, release 11.1, V11.1.105;Build cuda_11.1.TC455_06.29190527_0
CUDA compiler flags:-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-Wno-deprecated-gpu-targets;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_50,code=compute_50;-gencode;arch=compute_52,code=compute_52;-gencode;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-gencode;arch=compute_70,code=compute_70;-gencode;arch=compute_75,code=compute_75;-gencode;arch=compute_80,code=compute_80;-use_fast_math;;-mavx512f -mfma -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 11.20
CUDA runtime: 11.10

Test simulation of Lysozyme#

I used a tutorial from the gromacs website here to show it runs processes on GPU(s). The tutorial runs an MD simulation on a lysozyme but that's the extent of my study there. The commands below are a summary of the tutorial with a note that the genbox subcommand is now replaced by solvate.

gmx pdb2gmx -f 1LYD.pdb -water tip3p
gmx editconf -f conf.gro -bt dodecahedron -d 0.5 -o box.gro
gmx solvate -cp box.gro -cs spc216.gro -p topol.top -o solvated.gro
gmx trjconv -s solvated.gro -f solvated.gro -o solvated.pdb
gmx grompp -f em.mdp -p topol.top -c solvated.gro -o em.tpr -maxwarn 3

The final gromacs command below starts the fun, the documentation suggests it will automatically identify the GPUs available to send work to them. However, there are more explicit GPU arguments we encourage you to explore.

gmx mdrun -v -deffnm em

You can ssh into the node you're using in a separate window to have a parallel nvidia-smi command run so we can monitor the load on the GPU(s).

+-------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|===================================================================|
| 0 N/A N/A 143353 C gmx 165MiB |
| 1 N/A N/A 143353 C gmx 165MiB |
| 2 N/A N/A 143353 C gmx 167MiB |
| 3 N/A N/A 143353 C gmx 167MiB |
| 4 N/A N/A 143353 C gmx 167MiB |
| 5 N/A N/A 143353 C gmx 167MiB |
| 6 N/A N/A 143353 C gmx 167MiB |
| 7 N/A N/A 143353 C gmx 165MiB |
+-------------------------------------------------------------------+

We can see a process occuping each GPU so it works! At least, gromacs uses GPUs...the GPUs themselves weren't stressed heavily and that requires the user to increase the number of rank processes and match that with available GPUs. You can do this by adding arguments to the gmx mdrun command but by default it did 2 ranks per GPU it detected, which is not a lot.

(Optional) Compile Notes#

You need CUDA11, GNU Compiler, and OpenBLAS library for the version I put together but I was focused on a proof-of-concept and not squeezing out every last drop of performance. There's a lot of further optimization to be done and that's left as an exercise for the reader:

  1. Try the Intel compiler and see if it provides further optimization for non-GPU parts of the workflow.
  2. Try other math libraries (e.g., MKL) and see if it speeds things up.
  3. Add in MPI support if you want to use multiple GPUs across multiple nodes.
  4. Add in modules (e.g., PLUMED).
  5. Other stuff I can't think of with compile flags [here].

Download Source#

From the login node I staged a folder in the modules directory.

cd /sw/gromacs/2020.4-cuda11.1

Grab regression tests.

wget http://gerrit.gromacs.org/download/regressiontests-2020.4.tar.gz

Download gromacs-2020.4 [source].

wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-2020.4.tar.gz

Get a GPU and Code#

I used the shared build-gpu node for an interactive session but if you are affiliated with a group that has their own you can use that instead.

salloc -A uwit -p ckpt --time=4:00:00 -n 4 --mem=20G --gpus=1

Once you get a session with GPU (you can run nvidia-smi to confirm you see one). Extract regression tests.

tar xvzf regressiontests-2020.4.tar.gz

Do the same for the gromacs code and enter the directory.

tar xzvf gromacs-2020.4.tar.gz
cd gromacs-2020.4

Pre-requisite Modules#

Modules loaded individually for readability but you could load all modules in one command. Get a refresher on modules here.

module load cmake/3.11.2
module load gcc/10.1.0
module load cuda/11.1.1-1
module load contrib/openblas/0.2.20

Compile#

I created a subdirectory within the source to compile.

mkdir cuda11
cd cuda11

Use cmake to create the Makefile. Note: if you copy-and-paste the cmake command below you will have to modify the paths referenced for your environment.

cmake .. -DGMX_BUILD_OWN_FFTW=OFF -DREGRESSIONTEST_DOWNLOAD=OFF -DGMX_GPU=ON -DGMX_MPI=OFF -DCMAKE_INSTALL_PREFIX=/sw/gromacs/2020.4-cuda11.1 -DREGRESSIONTEST_PATH=/sw/gromacs/2020.4-cuda11.1/regressiontests-2020.4 -DCUDA_TOOLKIT_ROOT_DIR=/sw/cuda/11.1.1-1

With the Makefile ready you can run make -j 4 and replace 4 with however many cores you have in your session then make install. I created the module file separately so you can load it with module load gromacs/2020.4-cuda11.1 and run the single gmx binary.