Squash Fuse
Due to the large number of small files contained in most datasets, it is recommended to pack them in a Squash filesystem. Similar to containers which packages and runs all applications needed in an isolated environment, SquashFS packages all files you wish to use and creates a read-only, compressed filesystem with them. Squash filesystems act as a single file, which allows the server mounting it to read the entirety of the filesystem's metadata at once as it is not able to change. This saves a considerable amount of in metadata calls allowing for a massive increase in performance with little-to-no downsides. This performance increase is also felt server-side, as fewer metadata calls means reduced load on the storage system as a whole and more open throughput for other storage calls to take place.
Creating a SquashFS dataset
- Place all of the files you wish to be contained in your .sqsh file in a directory. Note that once created, the .sqsh cannot easily be edited.
- Run the following command to generate the squashfs file:
Duplicate files are detected and deleted in this process. You should also be able to see information such as the filesystem size and the number of files and directories used in the output. To manually check the size of the newly compressed file:
mksquashfs /path/to/files my_files.sqshls -lh my_files.sqsh - Cleanup the directory containing the files as needed.
Note that the
rm -r path/to/filesrmcommand permanently deletes files and directories. Ensure that the directory is no longer needed and the squashfs file was successfully created before proceeding.
Mounting using SquashFuse
It is useful to utilize job/array numbers and/or your user name to avoid the risk of colliding with other users. Log into a compute node using salloc or create a Slurm job or Slurm job array script to run the commands in this section. Remember to submit the script using sbatch and monitor it using squeue once all the necessary commands are in the Slurm script.
-
Create a directory for the mount.
mkdir -p /tmp/${USER}/${SLURM_JOB_ID}/my_squash_mnt_1The
-poption ensures that each intermediate directory in the path above are created if they do not exist already. The directorymy_squash_mnt_1will be your mount point. -
Mount the fileset using squashfuse.
squashfuse /path/to/my_files.sqsh /tmp/${USER}/${SLURM_JOB_ID}/my_squash_mnt_1 -
You are now able to access the files through the mount point.
ls /tmp/${USER}/${SLURM_JOB_ID}/my_squash_mnt_1If the mount was successful, the output will show all the squash filesystem contents. You are able to run any additional operations on the mounted filesystem now.
-
Unmount the fileset when done.
fusermount -u /tmp/${USER}/${SLURM_JOB_ID}/my_squash_mnt_1The
-uoption stands for unmount. After unmounting,my_squash_mnt_1should be an empty directory.