Squash Fuse
Due to the large number of small files contained in most datasets, it is recommended to pack them in a Squash filesystem. Similar to containers which packages and runs all applications needed in an isolated enviroment, SquashFS packages all files you wish to use and creates a read-only, compressed filesystem with them. Squash filesystems act as a single file, which allows the server mounting it to read the entirity of the filesystem's metadata at once as it is not able to change. This saves a considerable amount of in metadata calls allowing for a massive increase in performance with little-to-no downsides. This performance increase is also felt server-side, as fewer metadata calls means reduced load on the storage system as a whole and more open throughput for other storage calls to take place.
#
Creating a SquashFS dataset- Place all of the files you wish to be contained in your .sqsh file in a directory. Note that once created, the .sqsh cannot easily be edited.
- Run the following command to generate the squashfs file:Duplicate files are detected and deleted in this process. You should also be able to see information such as the filesystem size and the number of files and directories used in the output. To manually check the size of the newly compressed file:
- Cleanup the directory containing the files as needed. Note that the
rm
command permanently deletes files and directories. Ensure that the directory is no longer needed and the squashfs file was sucessfully created before proceeding.
#
Mounting using SquashFuseSlurm with Squash Fuse
It is useful to utilize job/array numbers and/or your user name to avoid the risk of colliding with other users. Log into a compute node using salloc
or create a Slurm job or Slurm job array script to run the commands in this section. Remember to submit the script using sbatch
and monitor it using squeue
once all the necessary commands are in the Slurm script.
Create a directory for the mount.
The
-p
option ensures that each intermediate directory in the path above are created if they do not exist already. The directorymy_squash_mnt_1
will be your mount point.Mount the fileset using squashfuse.
You are now able to access the files through the mount point.
If the mount was sucessful, the output will show all the squash filesystem contents. You are able to run any additional operations on the mounted filesystem now.
Unmount the fileset when done.
The
-u
option stands for unmount. After unmounting,my_squash_mnt_1
should be an empty directory.