JuiceFS or using Kopah on Klone
Nam Pho
Research ComputingIf you haven't heard, we recently launched an on-campus S3-compatible object storage service called Kopah [docs] that is available to the research community at the University of Washington. Kopah is built on top of Ceph and is designed to be a low-cost, high-performance storage solution for data-intensive research.
warning
This is a proof-of-concept demonstration and not a production-ready or officially endorsed solution, which is why we have not put a more formal walk through in our documentation.
While the deployment of Kopah was welcome news to those who are comfortable working with S3-compatible cloud solutions, we recognize some folks may be hesitant to give up their familiarity with POSIX file systems. If that sounds like you, we explored the use of JuiceFS, a distributed file system that provides a POSIX interface on top of object storage, as a potential solution.
info
Simplistically, object storage often presents using two API keys and data is accessed using a command line tool that wraps API calls, whereas POSIX is what you typically get presented with from the storage when interacting with a cluster via command-line.
#
InstallationJuiceFS isn't installed by default so you will need to compile it yourself or download the pre-compiled binary from their release page.
As of January 2025 the latest version is 1.2.3
and you want the amd64
version if using from Klone. The command below will download and extract the binary to your current working directory.
I have to move it to a folder in my $PATH
so I can run it from anywhere by just calling the binary. Your personal environment varies here.
Verify you can run JuiceFS.
Cool, now we can start using JuiceFS!
#
Standalone ModeThere are two ways to run JuiceFS, standalone or distributed mode. This blog post explores the former. Standalone mode is meant to only present Kopah via POSIX on Klone. The key points being:
- There is an active
juicefs
process required to run while you want to access it. - It is intended for you to run it only on the node you are running the process from.
If you wanted to run JuiceFS on multiple nodes or with multiple users then we will have another proof-of-concept with distributed mode in the future.
#
Create FilesystemJuiceFS separates the data (placed into S3 object storage) and the metadata, which is kept locally in a database. The command below will create the myjfs
filesystem and store the metadata in a SQLite database called myjfs.db
in the directory where the command is run. It puts the data itself into a Kopah bucket called npho-project
.
You can rename the metadata file and the filesystem name to whatever you want (they don't have to match). The same goes for the bucket name on Kopah. However, I would strongly recommend having unique metadata file names that match the file system names for ease of tracking alongside the bucket name itself.
You can verify there is now a myjfs.db
file in your current working directory. It's a SQLite database file that will store your file system meta data.
We can also verify the npho-project
bucket was created on Kopah to store the data itself.
You should run juicefs format --help
to view the full range of options and customize the parameters of your file system to your unique needs but just briefly:
- Encryption: When you create the file system and format it you can see it has encryption by default using AES256. You can over ride this using the
--encrypt-algo
flag if you preferchacha20-rsa
or you can use key file based encryption and provide your private key using the--encrypt-rsa-key
flag. - Compression: This is not enabled by default and there is a computational penalty for doing so if you want to access your files since it needs to be de or re encrypted on the fly.
- Quota: By default there is no block (set with
--capacity
in GiB units) or inode (set with--inodes
files) quota enforced at the file system level. If you do not explicitly set this, it will be matched to whatever you get from Kopah. This is still useful for setting explicitly if you wanted to have multiple projects or file systems in JuiceFS that use the same Kopah account and have some level of separation. - Trash: By default, files are not deleted immediately but moved to a trash folder similar to most desktop systems. This is set with the
--trash-days
flag and you can set it to0
if you want files to be deleted immediately. The default here is 1 day after which the file is permanently deleted.
#
Mount FilesystemRunning the command below will mount your newly created file system to the myproject
folder in your home directory. It does not need to previously exist.
warning
The SQLite database file is critical, do not lose it. You can move its location around afterwards but it contains all the meta data about your files.
This process occurs in the background.
warning
Where you mount your file system the first time is where it will be expected to be mounted going forward.
#
Use FilesystemNow with the file system mounted (at ~/myproject
) you can use it like any other POSIX file system.
Remember, you won't be able to see it in the bucket because it is encrypted before being stored there.
#
Recover Deleted FilesIf you enabled the trash can option then you can recover files up until the permanent delete date.
First delete a file on the file system.
Verify the file is deleted. Go to recover it from the trash bin.
As you can see, we can recover files that are tracked by their delete date. You would need to copy the file back out to recover it.
#
Unmount FilesystemWhen you are done using the file system you can unmount it with the command below.
Remember, the file system is only accessible in standalone mode so long as a juicefs
process is running. Since we ran it in the background you will need to explicitly unmount it.
#
Questions?Hopefully you found this proof-of-concept useful. If you have any questions for us, please reach out to the team by emailing help@uw.edu with Hyak somewhere in the subject or body. Thanks!