Skip to main content

3 posts tagged with "kopah"

View All Tags

August 2025 Maintenance Update

· 6 min read
Kristen Finch
Director of Research Computing Solutions

During August’s maintenance, we refreshed the operating system images for both login and compute nodes, upgraded Slurm to version 25.5.2, and upgraded Klone's filesystem (GPFS) for increased stability. We also introduced a new Globus OneDrive connector, making it easier than ever to transfer files between OneDrive and Hyak Klone or Kopah Storage.

Stay informed by subscribing to our mailing list and the UW-IT Research Computing Events Calendar. The next maintenance is scheduled for Tuesday, September 9, 2025 (the second Tuesday of the month).

Notable Updates

  • Node image updates – Routine updates plus installation of new Slurm utilities that we will test for job efficiency monitoring.
  • Slurm upgrade to 25.5.2 – Resolves a bugs the --gres flag allowed resources to be allocated to more than one job in version 25.05.0 and fixes X11 forwarding. Read more about this version.
  • GPFS upgrade to 5.1.9.11 – Improves stability and includes several bug fixes. Read more about this version.

New Features

Globus OneDrive Connector – UW-IT Research Computing has added OneDrive as a connector to Globus, making transfers between OneDrive and Hyak Klone or OneDrive and Kopah Storage easier than ever before!

Search Uw OneDrive to utilize the connector.

Things to note

  • Did you know that the UW Community are eligible for 5TB of storage on OneDrive as part of the Office 365 Suite? Click here to learn more.
  • While OneDrive is HIPAA and FERPA compatible, encryption is not enforced for Globus transfers on any of our connectors (OneDrive, Kopah, Klone). As a reminder, Klone and Kopah are NOT aligned with HIPPA, please keep this in mind now that OneDrive can transfer to either cluster.
  • Sharing with external partners is not enabled for our OneDrive or Klone connectors via Globus. Sharing is permitted for Kopah.
  • Read more

Office Hours

  • Wednesdays at 2pm on Zoom. Attendees need only register once and can attend any of the occurrences with the Zoom link that will arrive via email. Click here to Register for Zoom Office Hours.
  • Thursdays at 2pm in person in eScience. (address: WRF Data Science Studio, UW Physics/Astronomy Tower, 6th Floor, 3910 15th Ave NE, Seattle, WA 98195).
  • See our office hours schedule, subscribe to event updates, and bookmark our UW-IT Research Computing Events Calendar.

If you would like to request 1 on 1 help, please send an email to help@uw.edu with "Hyak Office Hour" in the subject line to coordinate a meeting.

UW Community Opportunities

  • The Data Science and AI Accelerator pairs eScience Institute data scientists with researchers from any field of study to work on focused, collaborative projects. Collaborations may center on analysis of an existing dataset to answer a specific research question, an implementation of software for processing or analyzing data, data visualization tools, or tools for data interpretation. Accelerator Projects may be submitted at any time. Projects for Fall 2025 must be received by August 14th, 2025. LEARN MORE HERE.
  • Applications for the CSDE Data Science and Demography Training program are due Friday, August 22nd by 5pm. An information session will take place Wednesday, August 13th at 10:00 a.m. DETAILS HERE.
  • Cloud Clinic August 14 10-11am - guest presenter Niris Okram from AWS presenting on “The Utility of Capacity Blocks: Optimizing computing horsepower per budget dollar.” This will be followed by a short presentation on building small-scale (“Littlest”) JupyterHubs. LEARN MORE HERE.
  • DubHacks - October 18 - October 19, 2025 - DubHacks 2025 takes you back to where it all began—the childhood bedroom. A space for imagination, curiosity, and bold ideas. Now, with code instead of crayons, you get to build what makes your younger self proud. No limits, just pure creativity. What will you create when you let yourself play?

External Training Opportunities

  • Automating Research with Globus: The Modern Research IT Platform - Aug. 18, 2025, 9 a.m. – 12 p.m. (Pacific Time) This workshop introduces Globus Flows and its role in automating research workflows. Participants will explore data portals, science gateways, and commons, enabling seamless data discovery and access. Enroll here.
  • CU-RMACC Webinars: Should I be Scared of AI? Aug. 18, 2025 - 3:00 PM - 4:00 PM EDT Throughout history, new technologies have sparked both excitement and fear—AI is no different. In this talk, Dr. Shelley Knuth, Assistant Vice Chancellor for Research Computing at the University of Colorado explores the common fears surrounding artificial intelligence, why we feel them, and how we can shift our perspective to focus on positive outcomes. We’ll look at practical ways to address risks, embrace innovation, and move forward with AI as a powerful tool rather than something to fear. Learn more and register.
  • COMPLECS: Batch Computing (Part II): Getting Started with Batch Job Scheduling 08/21/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • NUG Community Call: A Birds-Eye View of Using Cuda with C/C++ on Perlmutter (Part 2) August 27, 2025, 11 a.m. - 12:30 p.m. PDT - In this two-part training series, users will be introduced to the basics of using CUDA on Perlmutter at NERSC. The training will focus on the basics of the Perlmutter architecture and NVIDIA GPUs, programming concepts with CUDA using C/C++. Event 2 focuses on advance kernel and custom cuda kernels in C/C++. Learn more and register.
  • COMPLECS: Linux Tools for Text Processing 09/04/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • Python for HPC 09/09/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • COMPLECS: Data Transfer 09/18/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • COMPLECS: Interactive Computing 10/09/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • COMPLECS: Linux Shell Scripting 10/23/25 - 2:00 PM - 3:30 PM EDT Learn more and register.
  • COMPLECS: Using Regular Expressions with Linux Tools 11/06/25 - 2:00 PM - 3:30 PM EST Learn more and register.
  • COMPLECS: Batch Computing (Part III) High-Throughput and Many-Task Computing - Slurm Edition 12/04/25 - 2:00 PM - 3:30 PM EST Learn more and register.
  • R for HPC 12/04/25 - 2:00 PM - 3:30 PM EST Learn more and register.

Positions

  • Two PhD positions in Artificial Intelligence - in collaboration with German Aerospace Center and TU Dresden, Germany. Deadline to apply: 27 August 2025. Apply Now!

Questions about Hyak Klone, Tillicum, or any other UW-IT Research Computing Service? Fill out our Research Computing Consulting intake form. We are here to help!

Happy Computing,

Hyak Team

JuiceFS or using Kopah on Klone

· 7 min read
Nam Pho
Research Computing

If you haven't heard, we recently launched an on-campus S3-compatible object storage service called Kopah docs that is available to the research community at the University of Washington. Kopah is built on top of Ceph and is designed to be a low-cost, high-performance storage solution for data-intensive research.

warning

From our testing we have observed significant performance challenges for JuiceFS in "single mode" which is demonstrated by this blogpost. We do not recommend JuiceFS as a solution for demanding workflows.

While the deployment of Kopah was welcome news to those who are comfortable working with S3-compatible cloud solutions, we recognize some folks may be hesitant to give up their familiarity with POSIX file systems. If that sounds like you, we explored the use of JuiceFS, a distributed file system that provides a POSIX interface on top of object storage, as a potential solution.

info

Simplistically, object storage often presents using two API keys and data is accessed using a command line tool that wraps API calls, whereas POSIX is what you typically get presented with from the storage when interacting with a cluster via command-line.

Installation

JuiceFS isn't installed by default so you will need to compile it yourself or download the pre-compiled binary from their release page.

As of January 2025 the latest version is 1.2.3 and you want the amd64 version if using from Klone. The command below will download and extract the binary to your current working directory.

wget https://github.com/juicedata/juicefs/releases/download/v1.2.3/juicefs-1.2.3-linux-amd64.tar.gz -O - | tar xzvf -

I have to move it to a folder in my $PATH so I can run it from anywhere by just calling the binary. Your personal environment varies here.

mv -v juicefs ~/bin/

Verify you can run JuiceFS.

npho@klone-login03:~ $ juicefs --version
juicefs version 1.2.3+2025-01-22.4f2aba8
npho@klone-login03:~ $

Cool, now we can start using JuiceFS!

Standalone Mode

There are two ways to run JuiceFS, standalone or distributed mode. This blog post explores the former. Standalone mode is meant to only present Kopah via POSIX on Klone. The key points being:

  1. There is an active juicefs process required to run while you want to access it.
  2. It is intended for you to run it only on the node you are running the process from.

If you wanted to run JuiceFS on multiple nodes or with multiple users then we will have another proof-of-concept with distributed mode in the future.

Create Filesystem

JuiceFS separates the data (placed into S3 object storage) and the metadata, which is kept locally in a database. The command below will create the myjfs filesystem and store the metadata in a SQLite database called myjfs.db in the directory where the command is run. It puts the data itself into a Kopah bucket called npho-project.

juicefs format \
--storage s3 \
--bucket https://s3.kopah.uw.edu/npho-project \
--access-key REDACTED \
--secret-key REDACTED \
sqlite3://myjfs.db myjfs

You can rename the metadata file and the filesystem name to whatever you want (they don't have to match). The same goes for the bucket name on Kopah. However, I would strongly recommend having unique metadata file names that match the file system names for ease of tracking alongside the bucket name itself.

npho@klone-login03:~ $ juicefs format \
> --storage s3 \
> --bucket https://s3.kopah.uw.edu/npho-project \
> --access-key REDACTED \
> --secret-key REDACTED \
> sqlite3://myjfs.db myjfs
2025/01/31 11:52:47.940709 juicefs[1668088] <INFO>: Meta address: sqlite3://myjfs.db [interface.go:504]
2025/01/31 11:52:47.944930 juicefs[1668088] <INFO>: Data use s3://npho-project/myjfs/ [format.go:484]
2025/01/31 11:52:48.666657 juicefs[1668088] <INFO>: Volume is formatted as {
"Name": "myjfs",
"UUID": "eb47ec30-c1f7-4a92-9b17-23c4beae7f76",
"Storage": "s3",
"Bucket": "https://s3.kopah.uw.edu/npho-project",
"AccessKey": "removed",
"SecretKey": "removed",
"BlockSize": 4096,
"Compression": "none",
"EncryptAlgo": "aes256gcm-rsa",
"KeyEncrypted": true,
"TrashDays": 1,
"MetaVersion": 1,
"MinClientVersion": "1.1.0-A",
"DirStats": true,
"EnableACL": false
} [format.go:521]
npho@klone-login03:~ $

You can verify there is now a myjfs.db file in your current working directory. It's a SQLite database file that will store your file system meta data.

We can also verify the npho-project bucket was created on Kopah to store the data itself.

npho@klone-login03:~ $ s3cmd -c ~/.s3cfg-default ls                                      
2025-01-31 19:48 s3://npho-project
npho@klone-login03:~ $

You should run juicefs format --help to view the full range of options and customize the parameters of your file system to your unique needs but just briefly:

  • Encryption: When you create the file system and format it you can see it has encryption by default using AES256. You can over ride this using the --encrypt-algo flag if you prefer chacha20-rsa or you can use key file based encryption and provide your private key using the --encrypt-rsa-key flag.
  • Compression: This is not enabled by default and there is a computational penalty for doing so if you want to access your files since it needs to be de or re encrypted on the fly.
  • Quota: By default there is no block (set with --capacity in GiB units) or inode (set with --inodes files) quota enforced at the file system level. If you do not explicitly set this, it will be matched to whatever you get from Kopah. This is still useful for setting explicitly if you wanted to have multiple projects or file systems in JuiceFS that use the same Kopah account and have some level of separation.
  • Trash: By default, files are not deleted immediately but moved to a trash folder similar to most desktop systems. This is set with the --trash-days flag and you can set it to 0 if you want files to be deleted immediately. The default here is 1 day after which the file is permanently deleted.

Mount Filesystem

Running the command below will mount your newly created file system to the myproject folder in your home directory. It does not need to previously exist.

juicefs mount sqlite3://myjfs.db ~/myproject --background
warning

The SQLite database file is critical, do not lose it. You can move its location around afterwards but it contains all the meta data about your files.

This process occurs in the background.

warning

Where you mount your file system the first time is where it will be expected to be mounted going forward.

npho@klone-login03:~ $ juicefs mount sqlite3://myjfs.db ~/myproject --background
2025/01/31 11:57:01.652279 juicefs[1690855] <INFO>: Meta address: sqlite3://myjfs.db [interface.go:504]
2025/01/31 11:57:01.654920 juicefs[1690855] <INFO>: Data use s3://npho-project/myjfs/ [mount.go:629]
2025/01/31 11:57:02.156898 juicefs[1690855] <INFO>: OK, myjfs is ready at /mmfs1/home/npho/myproject [mount_unix.go:200]
npho@klone-login03:~ $

Use Filesystem

Now with the file system mounted (at ~/myproject) you can use it like any other POSIX file system.

npho@klone-login03:~ $ cp -v LICENSE myproject 
'LICENSE' -> 'myproject/LICENSE'
npho@klone-login03:~ $ ls myproject
LICENSE
npho@klone-login03:~ $

Remember, you won't be able to see it in the bucket because it is encrypted before being stored there.

Recover Deleted Files

If you enabled the trash can option then you can recover files up until the permanent delete date.

First delete a file on the file system.

npho@klone-login03:~ $ cd myproject 
npho@klone-login03:myproject $ rm -v LICENSE
removed 'LICENSE'
npho@klone-login03:myproject $

Verify the file is deleted. Go to recover it from the trash bin.

npho@klone-login03:myproject $ ls          
npho@klone-login03:myproject $ ls -alh
total 23K
drwxrwxrwx 2 root root 4.0K Jan 31 12:54 .
drwx------ 48 npho all 8.0K Jan 31 13:08 ..
-r-------- 1 npho all 0 Jan 31 11:57 .accesslog
-r-------- 1 npho all 2.6K Jan 31 11:57 .config
-r--r--r-- 1 npho all 0 Jan 31 11:57 .stats
dr-xr-xr-x 2 root root 0 Jan 31 11:57 .trash
npho@klone-login03:myproject $ ls .trash
2025-01-31-20
npho@klone-login03:myproject $ ls .trash/2025-01-31-20
1-2-LICENSE
npho@klone-login03:myproject $ cp -v .trash/2025-01-31-20/1-2-LICENSE LICENSE
'.trash/2025-01-31-20/1-2-LICENSE' -> 'LICENSE'
npho@klone-login03:myproject $ ls
LICENSE
npho@klone-login03:myproject $

As you can see, we can recover files that are tracked by their delete date. You would need to copy the file back out to recover it.

Unmount Filesystem

When you are done using the file system you can unmount it with the command below.

npho@klone-login03:~ $ juicefs umount myproject
npho@klone-login03:~ $

Remember, the file system is only accessible in standalone mode so long as a juicefs process is running. Since we ran it in the background you will need to explicitly unmount it.

Questions?

Hopefully you found this proof-of-concept useful. If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak somewhere in the subject or body. Thanks!

August 2024 Maintenance Details

· 3 min read
Kristen Finch
HPC Staff Scientist

Thanks again for your patience with our monthly scheduled maintenance. During this maintenance session, we were able to provide package updates to node images to ensure compliance with the latest operating system level security fixes and performance optimizations.

The next maintenance will be Tuesday September 10, 2024.

New self-hosted S3 storage option: KOPAH

We are happy to announce the preview launch of our self-hosted S3 storage called KOPAH. S3 storage is a solution for securely storing and managing large amounts of data, whether for personal use or research computing. It works like an online storage locker where you store can files of any size, accessible from anywhere with an internet connection. For researchers and those involved in data-driven studies, it provides a reliable and scalable platform to store, access, and analyze large datasets, supporting high-performance computing tasks and complex workflows.

S3 uses buckets as containers to store data, where each bucket can hold 100,000,000 objects, which are the actual files or data you store. Each object within a bucket is identified by a unique key, making it easy to organize and retrieve your data efficiently. Public links can be generated for KOPAH objects so that users can share buckets and objects with collaborators.

Click here to learn more about KOPAH S3.

Who should use KOPAH?

KOPAH is a storage solution for anyone. Just like other storage options out there, you can upload, download, and view your storage bucket with specialized tools and share your data via the internet. For Hyak users, KOPAH provides another storage option for research computing. It is more affordable than /gscratch storage and can be used for active research computing with a few added steps for retreiving stored data prior to a job.

Test Users Wanted

Prior to September, we are inviting test users to try KOPAH and provide feedback about their experience. If you are interested in becoming a KOPAH test user, please each help@uw.edu with Hyak or KOPAH in the subject line.

Requirements:

  1. While we will not charge for the service until September 1, to sign up as a test user, we require a budget number and worktag. If the service doesn't work for you, you can cancel before September.
  2. We will ask for a name for the account. If your groups has an existing account on Hyak, klone /gscratch, it makes sense of the names to match across services.
  3. Please be ready to respond with your feedback about the service.

Opportunities

PhD students should check out this opportunity for funding from NVIDIA: Graduate Research Fellowship Program

Questions? If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak in the subject line.