- Compute
- Storage
- Support
- Try Hyak
Condo Model
The Hyak clusters operate on a condo model. This means that the cluster itself consists of contributed resource slices from various groups across campus. The Hyak team, funded through the office of research and sponsoring entities, provides the core infrastructure (e.g., networking, storage, support staff). This is why faculty that are from sponsoring entities do not have any annual, ongoing costs associated with their slices beyond the initial cost of the hardware. The leadership of their sponsoring entities cover this. Faculty that are not affiliated with sponsoring entities have to shoulder this annual, ongoing cost associated with any slices they wish to contribute.
You get access to resources equivalent to the slices your account contributes to the cluster on-demand. A cluster account also provides you access to all the other contributed slices from other entities, subject to their availability (i.e., if the contributors or the resources aren't actively using them). This is referred to as the "checkpoint" partition due to the lack of job run-time guarantees. Once a checkpoint job starts it can be re-queued at any moment, but is not uncommon for a job to run for 4 to 5 hours before requeue. Longer checkpoint jobs will continue to run and be re-queued until they complete, which is why it is important that your job be able to checkpoint or save state and resume gracefully. Checkpoint access can provide access to substantial resources beyond what you contribute and is the benefit of joining a shared cluster like Hyak compared to buying the same hardware operating your own server.
The total cost considerations for compute nodes in Hyak can be broken down into the sum of the following two components.Slice Annual Costs
Self-Sponsored Slices (Annual)
$1,750 / 1 slice / 1 year
- Cluster membership evaluated annually.
- Access to the checkpoint partition for additional resources and compute time beyond what you contribute to the cluster.
- Grant application support.
- Scientific consultation for workflows and researcher onboarding.
- Access to workshops and other training as provided.
- Next business day support for questions.
- 24 / 7 / 365 monitoring of the cluster as a whole.
- Regular (cyber)security patching and updates.
- Historical uptimes better than 99% for the cluster not including previously scheduled maintenance days.
NOTE: Slices purchased separately (below).
Sponsored Slices (Annual)
$0 / year
- Everything that comes with self-sponsored slices.
- Slice lifetime guaranteed for a minimum of 4 years.
- No annual costs beyond the up front cost of the slices.
NOTE: Slices purchased separately (below).
If your lab has a faculty affiliation with a sponsoring entity (listed below), then you are only responsible for a one time, total up-front cost of the slices. You get 4 years of guaranteed and fully supported utilization per slice and beyond that subject to capacity and other conditions. You can skip down to the section below for specific slice configurations.
If your lab does not have a faculty affiliation with a sponsoring entity (listed below), then there is an annual cost of $1,750 per 1 slice per 1 year (Self-Sponsored Slices above).
Sponsors:- UW Seattle
- College of Arts & Sciences
- College of Engineering
- College of the Environment
- Institute for Protein Design
- School of Medicine
- UW Bothell
- UW Tacoma
Slice Hardware Configurations
Type | HPC Slices | GPU Slices | |||
Slice Count | 1 x HPC slice | 1 x GPU slice | |||
Name | standard | bigmem | custom | L40 | H100 |
Compute Cores | 32-cores | ||||
Memory (System) | 256GB | 512GB | >512GB | 384GB | |
GPU Type | N/A | 2 x L40 | 2 x H100 | ||
Memory (GPU) | N/A | 48GB per GPU | 80GB per GPU | ||
Pricing ($) | Email Us | Email Us |
General FAQ:
- All hardware is procured at cost (market value with substantial university negotiated bulk discounts) and no sales tax or university overhead applied.
- We reserve the 2nd Tuesday of every month for cluster maintenance.
- Slice Service Life:
- Sponsored Slices: All sponsored slices are supported for a minimum guaranteed lifetime of 4 years. Beyond 4 years all slices are continued to be made available subject to hardware viability (i.e., it didn't break) and the sponsoring entity still having capacity. Historically, this has been 6 years on average. However, past performance is not a guarantee of future experiences.
- Self-Sponsored Slices: Since self-sponsored slices have an on-going annual cost, this means slice life is reviewed on a yearly basis subject to the lab's willingness to continue, hardware viability, and overall cluster capacity.
- Storage:
- Local: Each full node has 1.5TB or more of local NVME SSD disk storage. This is non-persistent storage and is cleared after a job ends. Data must be copied to and from local SSD before and after each job to utilize this.
- Group: Each slice purchase includes 1 TB of storage space and a 1 million file count limit of shared group storage (i.e., gscratch) accessible from every node. Additional storage quota increases can be purchased for $10 per month for 1TB of additional space and 1 million additional file count limit. Additional "scrubbed" shared storage is available for short-term use, but will be automatically deleted if not accessed for several weeks.
- All slices are standardized on AMD EPYC 9654 CPUs ("Genoa").
- A physical server (or node) has 192-cores and >1.5TB of memory packaged in a single box. This is in turn sub-divided into 6 equal "slices" that are resources of compute units that are sold to researchers.
- They are identically configured with your choice of memory (or RAM).
- Any jobs requiring multiple nodes should be prepared to be independent computations (i.e., "embarassingly parallel") or make use of message passing libraries (e.g., OpenMPI) to scale across multiple nodes simultaneously.
- All slices are standardized on AMD EPYC 9534 CPUs ("Genoa"). We are on the NVIDIA "Ada" and "Hopper" generation of GPUs.
- 4 x GPU slices constitute a single physical server (or node). It is a single box with 128-cores, 1.5TB of memory, and 8 x GPUs of the same type. They are sold in resource slices to make this a more tractable cost for labs with more modest GPU needs.
- Any jobs requiring more than 8 x GPUs of the same type should be prepared to make use of message passing libraries (e.g., PyTorch Lightning) to scale across multiple servers. Any job up to the equivalent of 4 x GPU slices (i.e., 8 x GPU cards) can be run on the same physical machine and therefore scale easily without much further modification to the codebase.
🔥 gscratch (Parallel File System)
$10 / 1 TB [1M files] / 1 month
- A "hot" storage tier.
- On campus parallel file system directly connected to Hyak.
- No data access expenses or bandwidth limits.
- Direct high speed / low latency Infiniband connectivity with (Hyak) compute nodes.
- 80 Gbps aggregate Ethernet upstream connectivity to external collaborators.
- Use of common tools to migrate data (e.g., scp, rsync).
💧 KOPAH (Object Storage)
Coming Soon
- A "warm" storage tier.
- On campus object storage with 80 Gbps of aggregate upstream connectivity.
- No data access expenses or bandwidth limits.
- S3-compliant so any existing S3 tools can be used to copy and retrieve data.
- Public buckets for external data sharing.
- Private buckets for internal and lab-only access.
🧊 LOLO Archive (Tape)
$3.45 / 1 TB / 1 month
- A "cold" storage tier.
- Cloud tape archive storage medium, one of the most stable.
- No data access expenses or bandwidth limits.
- Use of common tools to migrate data (e.g., scp, rsync).
- Automatic geographical redundancy of your data (i.e., 2 copies).
All storage and compute purchases come with support. A team of systems and storage engineers as well as staff scientists will provide at least next business day acknowledgement of any emails or tickets. Depending on the nature of the request or question there may be further delayed follow up for the tasks at hand. To start a help ticket email help@uw.edu and include Hyak in the subject line. Or use the button below.
Hyak Demo Accounts
Hyak no-cost demonstration accounts are intended for prospective slice owners to use the Hyak resources and assess whether the resources can serve their research computing needs. The account has some limits and not all features of Hyak will be available for demonstration, but you will be able to test workflows and software on the cluster.
Demonstration accounts are subject to the following restrictions:- Jobs may only be submitted to the ckpt partition.
- Storage is limited under the demo account to 10GB in the home directory. For additional temporary storage you may utilize /gscratch/scrubbed storage. Be aware that files in scrubbed not used for several months will be deleted. This storage in not intended for large datasets, but can be helpful as you try out workflows on Hyak.
- You may submit as many jobs as you like, but the scheduler will only allow one to run at a time.
- Your jobs are limited to 80 cores, 360 GB of memory, 2 GPUs, and a maximum of 2 discrete nodes.