5 posts tagged with "gscratch"

JuiceFS or using Kopah on Klone

January 31, 2025 · 7 min read

Research Computing

If you haven't heard, we recently launched an on-campus S3-compatible object storage service called Kopah [docs] that is available to the research community at the University of Washington. Kopah is built on top of Ceph and is designed to be a low-cost, high-performance storage solution for data-intensive research.

warning

This is a proof-of-concept demonstration and not a production-ready or officially endorsed solution, which is why we have not put a more formal walk through in our documentation.

While the deployment of Kopah was welcome news to those who are comfortable working with S3-compatible cloud solutions, we recognize some folks may be hesitant to give up their familiarity with POSIX file systems. If that sounds like you, we explored the use of JuiceFS, a distributed file system that provides a POSIX interface on top of object storage, as a potential solution.

info

Simplistically, object storage often presents using two API keys and data is accessed using a command line tool that wraps API calls, whereas POSIX is what you typically get presented with from the storage when interacting with a cluster via command-line.

Installation

JuiceFS isn't installed by default so you will need to compile it yourself or download the pre-compiled binary from their release page.

As of January 2025 the latest version is 1.2.3 and you want the amd64 version if using from Klone. The command below will download and extract the binary to your current working directory.

wget https://github.com/juicedata/juicefs/releases/download/v1.2.3/juicefs-1.2.3-linux-amd64.tar.gz -O - | tar xzvf -

I have to move it to a folder in my $PATH so I can run it from anywhere by just calling the binary. Your personal environment varies here.

mv -v juicefs ~/bin/

Verify you can run JuiceFS.

npho@klone-login03:~ $ juicefs --version
juicefs version 1.2.3+2025-01-22.4f2aba8
npho@klone-login03:~ $ 

Cool, now we can start using JuiceFS!

Standalone Mode

There are two ways to run JuiceFS, standalone or distributed mode. This blog post explores the former. Standalone mode is meant to only present Kopah via POSIX on Klone. The key points being:

There is an active juicefs process required to run while you want to access it.
It is intended for you to run it only on the node you are running the process from.

If you wanted to run JuiceFS on multiple nodes or with multiple users then we will have another proof-of-concept with distributed mode in the future.

Create Filesystem

JuiceFS separates the data (placed into S3 object storage) and the metadata, which is kept locally in a database. The command below will create the myjfs filesystem and store the metadata in a SQLite database called myjfs.db in the directory where the command is run. It puts the data itself into a Kopah bucket called npho-project.

juicefs format \
  --storage s3 \
  --bucket https://s3.kopah.uw.edu/npho-project \
  --access-key REDACTED \
  --secret-key REDACTED \
  sqlite3://myjfs.db myjfs

You can rename the metadata file and the filesystem name to whatever you want (they don't have to match). The same goes for the bucket name on Kopah. However, I would strongly recommend having unique metadata file names that match the file system names for ease of tracking alongside the bucket name itself.

npho@klone-login03:~ $ juicefs format \
>   --storage s3 \
>   --bucket https://s3.kopah.uw.edu/npho-project \
>   --access-key REDACTED \
>   --secret-key REDACTED \
>   sqlite3://myjfs.db myjfs
2025/01/31 11:52:47.940709 juicefs[1668088] <INFO>: Meta address: sqlite3://myjfs.db [interface.go:504]
2025/01/31 11:52:47.944930 juicefs[1668088] <INFO>: Data use s3://npho-project/myjfs/ [format.go:484]
2025/01/31 11:52:48.666657 juicefs[1668088] <INFO>: Volume is formatted as {
  "Name": "myjfs",
  "UUID": "eb47ec30-c1f7-4a92-9b17-23c4beae7f76",
  "Storage": "s3",
  "Bucket": "https://s3.kopah.uw.edu/npho-project",
  "AccessKey": "removed",
  "SecretKey": "removed",
  "BlockSize": 4096,
  "Compression": "none",
  "EncryptAlgo": "aes256gcm-rsa",
  "KeyEncrypted": true,
  "TrashDays": 1,
  "MetaVersion": 1,
  "MinClientVersion": "1.1.0-A",
  "DirStats": true,
  "EnableACL": false
} [format.go:521]
npho@klone-login03:~ $ 

You can verify there is now a myjfs.db file in your current working directory. It's a SQLite database file that will store your file system meta data.

We can also verify the npho-project bucket was created on Kopah to store the data itself.

npho@klone-login03:~ $ s3cmd -c ~/.s3cfg-default ls                                      
2025-01-31 19:48  s3://npho-project
npho@klone-login03:~ $ 

You should run juicefs format --help to view the full range of options and customize the parameters of your file system to your unique needs but just briefly:

Encryption: When you create the file system and format it you can see it has encryption by default using AES256. You can over ride this using the --encrypt-algo flag if you prefer chacha20-rsa or you can use key file based encryption and provide your private key using the --encrypt-rsa-key flag.
Compression: This is not enabled by default and there is a computational penalty for doing so if you want to access your files since it needs to be de or re encrypted on the fly.
Quota: By default there is no block (set with --capacity in GiB units) or inode (set with --inodes files) quota enforced at the file system level. If you do not explicitly set this, it will be matched to whatever you get from Kopah. This is still useful for setting explicitly if you wanted to have multiple projects or file systems in JuiceFS that use the same Kopah account and have some level of separation.
Trash: By default, files are not deleted immediately but moved to a trash folder similar to most desktop systems. This is set with the --trash-days flag and you can set it to 0 if you want files to be deleted immediately. The default here is 1 day after which the file is permanently deleted.

Mount Filesystem

Running the command below will mount your newly created file system to the myproject folder in your home directory. It does not need to previously exist.

juicefs mount sqlite3://myjfs.db ~/myproject --background

warning

The SQLite database file is critical, do not lose it. You can move its location around afterwards but it contains all the meta data about your files.

This process occurs in the background.

warning

Where you mount your file system the first time is where it will be expected to be mounted going forward.

npho@klone-login03:~ $ juicefs mount sqlite3://myjfs.db ~/myproject --background
2025/01/31 11:57:01.652279 juicefs[1690855] <INFO>: Meta address: sqlite3://myjfs.db [interface.go:504]
2025/01/31 11:57:01.654920 juicefs[1690855] <INFO>: Data use s3://npho-project/myjfs/ [mount.go:629]
2025/01/31 11:57:02.156898 juicefs[1690855] <INFO>: OK, myjfs is ready at /mmfs1/home/npho/myproject [mount_unix.go:200]
npho@klone-login03:~ $ 

Use Filesystem

Now with the file system mounted (at ~/myproject) you can use it like any other POSIX file system.

npho@klone-login03:~ $ cp -v LICENSE myproject 
'LICENSE' -> 'myproject/LICENSE'
npho@klone-login03:~ $ ls myproject 
LICENSE
npho@klone-login03:~ $ 

Remember, you won't be able to see it in the bucket because it is encrypted before being stored there.

Recover Deleted Files

If you enabled the trash can option then you can recover files up until the permanent delete date.

First delete a file on the file system.

npho@klone-login03:~ $ cd myproject 
npho@klone-login03:myproject $ rm -v LICENSE 
removed 'LICENSE'
npho@klone-login03:myproject $

Verify the file is deleted. Go to recover it from the trash bin.

npho@klone-login03:myproject $ ls          
npho@klone-login03:myproject $ ls -alh          
total 23K
drwxrwxrwx  2 root root 4.0K Jan 31 12:54 .
drwx------ 48 npho all  8.0K Jan 31 13:08 ..
-r--------  1 npho all     0 Jan 31 11:57 .accesslog
-r--------  1 npho all  2.6K Jan 31 11:57 .config
-r--r--r--  1 npho all     0 Jan 31 11:57 .stats
dr-xr-xr-x  2 root root    0 Jan 31 11:57 .trash
npho@klone-login03:myproject $ ls .trash  
2025-01-31-20
npho@klone-login03:myproject $ ls .trash/2025-01-31-20 
1-2-LICENSE
npho@klone-login03:myproject $ cp -v .trash/2025-01-31-20/1-2-LICENSE LICENSE
'.trash/2025-01-31-20/1-2-LICENSE' -> 'LICENSE'
npho@klone-login03:myproject $ ls                     
LICENSE
npho@klone-login03:myproject $ 

As you can see, we can recover files that are tracked by their delete date. You would need to copy the file back out to recover it.

Unmount Filesystem

When you are done using the file system you can unmount it with the command below.

npho@klone-login03:~ $ juicefs umount myproject
npho@klone-login03:~ $ 

Remember, the file system is only accessible in standalone mode so long as a juicefs process is running. Since we ran it in the background you will need to explicitly unmount it.

Questions?

Hopefully you found this proof-of-concept useful. If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak somewhere in the subject or body. Thanks!

February 2024 Maintenance Details

February 13, 2024 · 3 min read

Nam Pho

Director for Research Computing

Hello Hyak community! We have a few notable announcements regarding this month’s maintenance. If the hyak-users mailing list e-mail didn’t fully satisfy your curiosity, hopefully this expanded version will answer any lingering questions.

GPUs

Software: The GPU driver was upgraded to the latest stable version (545.29.06). The latest CUDA 12.3.2 is also now provided as a module. You are also encouraged to explore the use of container (i.e., Apptainer) based workflows, which bundle various versions of CUDA with your software of interest (e.g., PyTorch) over at NGC. NOTE: Be sure to pass the --nv flag to Apptainer when working with GPUs.
Hardware: The Hyak team has also begun the early deployments of our first Genoa-Ada GPU nodes. These are cutting-edge NVIDIA L40-based GPUs (code named “Ada”) running on the latest AMD processors (code named “Genoa”) with 64 GPUs released to their groups two weeks ago and an additional 16 GPUs to be released later this week. These new resources are not currently part of the checkpoint partition but we will be releasing guidance on making use of idle resources here over the coming weeks directly to the Hyak user documentation as we receive feedback from these initial researchers.

Storage

Performance Upgrade: In recent weeks, AI/ML workloads have been increasingly stressing the primary storage on klone (i.e., "gscratch"). Part of this was attributed to the run up to the International Conference for Machine Learning (ICML) 2024 full paper deadline on Friday, February 2. However, it also reflects a broader trend in the increasing demands of data-intensive research. The IO profile was so heavy at times that our systems automation throttled the checkpoint capacity to near 0 in order to keep storage performance up and prioritize general cluster navigation and contributed resources. We have an internal tool called iopsaver that automatically reduces IOPS by intelligently requeuing checkpoint jobs generating the highest IOPS while concurrently limiting the number of total active checkpoint jobs until the overall storage is within its operating capacity. At times over the past few weeks you may have noticed that iopsaver had reduced the checkpoint job capacity to near 0 to maintain overall storage usability.

During today’s maintenance, we have upgraded the memory on existing storage servers so that we could enable Local Read-Only Cache (LROC) although we don’t anticipate it will be live until tomorrow. Once enabled, LROC allows the storage cluster to make use of a previously idle SSD capacity to cache frequently accessed files on this more performant storage tier medium. We expect LROC to make a big difference as during this period of the last several weeks, the majority of the recent IO bottlenecking was attributed to a high volume of read operations. As always, we will continue to monitor developments and adjust our policies and solutions accordingly to benefit the most researchers and users of Hyak.
Scrubbed Policy: In the recent past this space has filled up. As a reminder, this is a free-for-all space and a communal resource for when you have data you only need to temporarily burst out into past your usual allocations from your other group affiliations. To ensure greater equity among its use, we have instituted a 10TB and 10M files limit for each user in scrubbed. This impacts <1% of users as only a handful of users were using an amount of quota from scrubbed >10TB.

Questions?

Hopefully you found these extra details informative. If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak somewhere in the subject or body. Thanks!

Hyak Team Storage Optimizations

April 21, 2022 · 6 min read

Nam Pho

Director for Research Computing

note

The Hyak team has taken six concrete steps to stabilize and optimize storage on klone over the past few weeks.

While the storage on klone (i.e., mmfs1 or gscratch) may appear to be a monolithic device, it is an extremely complex cluster in its own right. This storage cluster is mounted on every klone node: so despite appearing as "on the node", gscratch physically resides on specialized storage hardware separated from the compute resources of klone. The storage is accessed across a high-speed, ultra low-latency HDR Infiniband network, and is designed to be scalable independent of KLONE’s compute resources.

As mentioned in an earlier blog post today, our incoming hardware expansion will drastically increase the amount of demand the storage cluster can handle. In the meantime, the Hyak team has taken measures to help maintain a usable level of storage performance for users and jobs:

1. Improved internal storage metrics gathering and visibility.

The Hyak team improved storage-cluster metric gathering and visibility, allowing us to correlate those metrics to reports of poor user experience, and to make data-driven tuning and storage policy decisions.

In the figure above we have visibility into if an abnormally high number of jobs have errors that might suggest underlying storage or other user experience issues.

2. Created custom filesystem migration policies to optimize the use of the NVMe layer.

The bulk of the storage capacity on klone is stored on rotary hard disk drives totalling approximately 1.7 Petabytes (PB) of raw storage. In addition to the hard disk storage, there is a much smaller, extremely fast–and expensive–pool of NVMe "flash" storage that functions both as a write buffer for new files written to the filesystem, and also as a read-cache-like layer where files can be read without causing load on the rotary disks.

The Hyak team has also optimized the file placement policy: files most likely to generate heavy load reside in the limited space of the NVMe layer, ensuring that no storage load is generated on the hard disk layer when those files are repeatedly accessed.

In the figure above you can see that the flash tier (green line) is allowed to fill up to 80% capacity due to job writes then the migration policy begins until the flash tier is down to 65% full. For the majority of the past few several weeks we can see things worked as expected. However, there were a few events recently where jobs were producing so much data that the flash tier was able to get to 100% full faster than the storage system could move data off the flash tier. Giving the migration process too high of a priority results in "slowness" in the user experience. We have since been tuning the aggressiveness of this migration process to reduce the likelihood of it occuring again.

3. Added QoS policies to improve worst-case filesystem responsiveness.

The klone filesystem has a coarse Quality-of-Service (QoS) tuning facility that allows the filesystem to cap the rate of storage operations for various types of storage input-output (IO). The Hyak team has used this facility in two different ways:

First, to limit the storage load impact when the NVMe layer, described above, needs to free up space by moving files to the hard drive layer.
Secondly, to moderate the amount of storage load that can be generated by any single compute node in the cluster. This way, outlier jobs in terms of storage load generation are less likely to have an outsized performance impact on the storage.

4. Manually identifying jobs causing a disproportionate impact on storage performance.

Utilizing metrics and old-fashioned sleuthing, we have been manually tracking down individual jobs that appear to be having a disproportionate and/or unnecessary impact on storage performance, and working with users to address the storage performance impact of these jobs.

In the above figure we can see job IO follows a power law dynamic, a small handful of jobs are often responsible for the majority of load. In this case a single job on a single node is responsible. When users report storage "slowness" this disrepancy can be even more pronounced but we are able to quickly narrow down which specific nodes are responsible and address these corner cases.

5. Dynamically reducing the number of running checkpoint partition jobs.

As of April 19th, 2022, we have implemented data-driven automation to moderate storage load by dynamically managing the number of running checkpoint (ckpt) partition jobs. When the number of running ckpt jobs is being limited, pending jobs will show AssocGrpJobsLimit as the REASON for not starting.

Please note that non-ckpt jobs (i.e., jobs submitted to nodes your lab contributed to the cluster) are not limited in any way. The social contract when joining the Hyak community is that you get access to the nodes your lab contributes on-demand, and–if and when they are idle–access to other labs’ resources on the cluster. However, access to other labs’ resources isn’t and hasn’t ever been guaranteed: it’s just that there’s often a steady state idle capacity for users to "burst" into by submitting ckpt jobs.

In aggregate, 'Storage Load' is a consumable resource just like CPU cores or memory, albeit one that impacts the whole cluster when it is over-consumed. The Slurm cluster scheduler cannot directly consider storage load availability when evaluating resources for starting ckpt jobs, hence our need to automate. Our new tooling limits the storage performance impact from ckpt jobs in order to improve storage stability for everyone.

The red and blue lines represent two storage servers that we have most closely tied to the user experience and 50% load being the threshold we aim to remain at or under by dynamically reducing the number of running ckpt jobs when it exceeds that limit.

So far, this appears to be very effective at moderating the overall storage load, preventing the storage cluster from becoming unusably slow and avoiding other storage-performance issues. We will continue to tune it in search of the best balance between idle resource utilization via ckpt and storage performance.

6. Expanding the team

Acknowledging that the storage sub-system is a complicated machine in its own right, it needs much more care and attention and the current Hyak team is stretched incredibly thin as is. We have started the process of hiring a dedicated research data storage systems engineer to focus on optimizing storage going forward.

See also:

klone Users Storage Optimizations

April 21, 2022 · 5 min read

Nam Pho

Director for Research Computing

note

There are steps you, as a researcher using klone, can do to limit the impact of whatever else is happening on the cluster on your individual workflows.

While some of what precipitated this conversation is the current state of the storage (i.e., mmfs1 or gscratch), there are several things you can do as a researcher to both reduce the load on gscratch as well as help insulate your jobs from cluster-wide storage slowdowns.

1. Use local node SSDs.

Each node on the cluster has a local SSD drive with 350+ GB of space available for use by user jobs. This space is available only to jobs running on that node and all contents are purged when the users’ last job running on the node completes. It is mounted as /scr and /tmp (both paths go to the same place) on all the compute nodes.

If input data, Apptainer (Singularity) images, or other files used by your job will fit, copying those files to the SSD (via cp, rsync, etc.) once at the beginning of your job and reading them from there during the remainder of the job run results in less load on the central storage, helps insulate your job from any instances of central storage slowness, and can often result in better overall job performance.

Slurm has a command called sbcast [www] that is useful for efficiently copying files to all nodes used in a multi-node job as part of an sbatch script.

For files being written that need to be kept after the job run, it is generally best to write these directly to the central storage. Because new files are written directly to the very fast NVMe layer, such writes are less likely to impact overall storage performance. That said, it is still beneficial to write intermediate job files to the local SSD whenever possible.

2. Code for efficient file IO.

While this can be a very complicated topic, a great deal of overall job performance can be gained by thoughtful and judicious use of file input-output (IO). Some general tips:

Keep in mind that file access is orders of magnitude slower than memory access, and processes often have to completely "stop and wait" for disk IO operations to complete. Minimizing file IO operations, especially inside "inner loops" of programs can greatly speed up job completion, and helps to reduce load on the cluster central storage.
Fewer, larger file IO operations are generally more efficient than multiple smaller file operations accessing the same data.
When possible, store data in an efficient format such as HDF5 instead of many small files.
"Open/read once, access many times" if job memory permits.

3. Containerize your environment.

As mentioned above, minimizing the number of files you need to access can help reduce the number of input / output operations per second (IOPS) happening on the cluster. For example, a Python miniconda environment can create hundreds or even thousands of small files when you install different library dependencies. While Python is a common compute environment, this can be generalized to most other programs you may need. When you containerize your environment, this gets reduced to a single file. A brief introduction to Singularity (now called Apptainer) can be found here. As a side benefit, containerizing your environment–making it a single file–makes it much easier to move it around (see #1 above).

4. Stay under quota.

Constantly hitting your inode (e.g., file) or block (e.g., number of GBs or TBs) quotas can cause extra storage slowness. If you need a bump on either please reach out to discuss your options. As a reminder you can us the hyakstorage command on klone to display current quota usage for all of your filesets as well as your home directory. Please note that this output is updated once an hour so it will take time to reflect any overages.

5. Report issues.

While the Hyak team has an extensive monitoring and alerting framework in place to help us to proactively determine when things may be going wrong, not all causes of slow user experience are currently correlated to metrics. Furthermore, our team generally interfaces with the cluster in different ways than our users, so we may not be as equally exposed to any pains until it is reported to us. If you’ve run into a performance issue, please submit a ticket by emailing help@uw.edu. Please provide any symptoms you are observing, along with the date, timeframe, job IDs (if applicable), commands you are running with their full output, etc. If you don’t need or want a reply from us it is still helpful for us to hear from you, feel free to say "no response needed" or something along these lines so we know how to respond.

See also:

An update on klone storage

April 20, 2022 · 5 min read

Nam Pho

Director for Research Computing

note

klone has experienced exponential growth over the first year of its launch, necessitating long-standing storage ugprades to occur. The current estimate is between June and July 2022 for deployment of this hardware.

The 3rd generation Hyak cluster, klone, launched in spring 2021 with 144 HPC nodes and 192 GPUs. In just a single year, we’ve grown to over 384 HPC nodes (a 166% increase) and 448 GPUs (a 133% increase). klone has more than doubled in size, and while some of this growth comes from long-standing Hyak members migrating to the new cluster, much of our increased capacity comes from hundreds of new researchers joining the Hyak community. We’ve seen existing sponsors such as the College of Engineering increase their already substantial footprints by 60%, we’ve welcomed new sponsors such as UW Bothell, UW Tacoma, and the Puget Sound Institute, and seen over 1000% growth–seriously–in our new self-sponsored tier for investigators and faculty without an existing Hyak sponsor affiliation. As with any large project, during KLONE’s initial planning stages we made assumptions about our growth rate & the types of research we would be supporting: assumptions that have been shattered by our growth over the past year. It was never a question of if we would need to upgrade our support infrastructure–like storage–but when, and our rapid growth significantly accelerated our upgrade timeline.

Monitoring – and developing more monitoring for – the Hyak clusters is a central responsibility of our team. The status quo at the beginning of 2022 was to track down errant jobs or workflows when storage issues came up. In almost every instance, we were able to pinpoint the problematic job and work with the researcher to shape their code into a normal IO profile. Pausing jobs and providing best practices was sufficient to keep the storage performance solid for everyone. However, starting around the last week of March 2022, we started having trouble finding an obvious job, or even a set of jobs, impacting storage performance.

The truth is that our baseline load had shifted. Due to our tremendous growth, things researchers had previously been doing without issue were now causing problems. We also noticed an evolution of the types of research happening on klone. The Hyak community diversified from traditional HPC workflows (e.g., simulations) into more data-intensive areas like data science (e.g., R jobs), deep learning, and artificial intelligence research. We accelerated our discussions with storage vendors: in a few short months, an expansion went from an eventuality to an immediate and pressing need. Still, we tried several last-minute optimizations to see if we could prevent spending all that money. We are serious about our fiduciary duty, as stewards of this research platform, to provide the most value for the Hyak community with the dollars we are entrusted with. We knew a storage upgrade for klone would cost hundreds of thousands of dollars and we needed absolute certainty that we couldn’t engineer a way around that expense.

The storage on klone (i.e., mmfs1 or gscratch) might pretend to be a mere folder or directory, but in truth it’s an abstraction of a highly complex system. To provide cost-effective, high-performance storage, a small high-speed NVMe "flash" layer acts both as a write buffer for the slower spinning disks–which make up the vast majority of cluster’s capacity–and as a high-speed "cache" for recently & frequently accessed small files. While presented as a single folder to the researcher, behind the scenes the storage cluster moves data between these tiers to balance performance. As seen in the figure above, when the flash layer reaches 80% capacity, a process begins to drain it by moving less frequently used files to the spinning-disk layer until the flash layer reaches 65% capacity. You might also notice that despite our precautions and monitoring, as of April 9, 2022, we were no longer able to migrate data from flash to spinning disks faster than our users were writing. This was the final deciding factor for us, and we initiated our long-standing plan to upgrade the storage for klone.

This necessary investment to upgrade storage will double both the maximum input-output operations-per-second (IOPS) and throughput (storage bandwidth), providing much needed overhead for current workflows as well as accommodating future growth. We are excited for this upgrade – and are doing everything we can to expedite its deployment – but due to the sheer amount of hardware we’re purchasing, we’ve been swept up in the pandemic-induced global supply chain crunch. Our vendors have predicted that the end of July is the worst-case scenario, but that a June delivery is also possible. We will update the Hyak community as we know more. As always, we welcome any questions: if you want to speak with us about something, send an email to the Hyak team via help@uw.edu and we’ll follow up with you.

See also:

Installation​

Standalone Mode​

Create Filesystem​

Mount Filesystem​

Use Filesystem​

Recover Deleted Files​

Unmount Filesystem​

Questions?​

GPUs​

Storage​

Questions?​

1. Improved internal storage metrics gathering and visibility.​

2. Created custom filesystem migration policies to optimize the use of the NVMe layer.​

3. Added QoS policies to improve worst-case filesystem responsiveness.​

4. Manually identifying jobs causing a disproportionate impact on storage performance.​

5. Dynamically reducing the number of running checkpoint partition jobs.​

6. Expanding the team​

1. Use local node SSDs.​

2. Code for efficient file IO.​

3. Containerize your environment.​

4. Stay under quota.​

5. Report issues.​

Installation

Standalone Mode

Create Filesystem

Mount Filesystem

Use Filesystem

Recover Deleted Files

Unmount Filesystem

Questions?

GPUs

Storage

Questions?

1. Improved internal storage metrics gathering and visibility.

2. Created custom filesystem migration policies to optimize the use of the NVMe layer.

3. Added QoS policies to improve worst-case filesystem responsiveness.

4. Manually identifying jobs causing a disproportionate impact on storage performance.

5. Dynamically reducing the number of running checkpoint partition jobs.

6. Expanding the team

1. Use local node SSDs.

2. Code for efficient file IO.

3. Containerize your environment.

4. Stay under quota.

5. Report issues.