R is a popular statistical programming language for data science and analytics. We rely on Apptainer (formerly Singularity) and Docker containers to deploy R and you can get a refresher on modules and containers.
We encourage users to employ the containerized versions of R instead of compiling from source and running bare-metal. We'll use Docker hub containers as that is where the most regular updates to R come from.
If you use a non-custom R container you'll likely want to run
install.packages() at some point. Usually on a non-shared platform like your local setup (where you have full administrative privileges) R will install things into central paths. You don't want to do that on HYAK so you need to specify user paths.
If you plan on using multiple R versions you will want to set
R_LIBS appropriately with each different container (i.e., R version) used so packages compiled against one version of R don't conflict with another. Using sub-folders with names matching that version of R is sufficient.
You can set custom R environment variables with the
.Renviron file. I set the
R_LIBS environment variable to point to a folder I created in "scrubbed" as an example but you will want to use a shared lab space or other path unique to your environment.
If you set
R_LIBS to your home directory you can quickly run out of inodes as R likes to create a lot of files. Use your lab directory instead.
Let's say we wanted to use R-4.0.3 from Docker hub [www].
Be sure to do this from a build node, you need to be routed to the internet to resolve Dockerhub so you can download and have compute resources to do the image conversion from a Docker to Apptainer container.
The command will take a minute and create the SIF file in your current directory.
You can run the R binary within the container like below.
You can run
install.packages() as you normally would if you were working with R locally and it will install all the files to whatever path you set
R_LIBS to in the user environment instructions.
The most popular library for R is the Tidyverse [www], which includes packages like
dplyr, and others. As you can see in the previous section, it doesn't exist if we use the
r-base Docker hub container.
Your options are to:
- use a Docker container with it pre-installed.
Option 1, while ok, uses a lot (and I mean a lot) of inodes as well as taking a long time to compile. It's much leaner on the cluster and faster to use a pre-built container if you know you'll use the Tidyverse.
Prior instructions on R user environment apply but once downloaded (the Docker to Apptainer conversion will take a few minutes), it will create a separate SIF file as shown below.
Now when you run this container's R binary you can successfully load the Tidyverse.
We've since migrated from bare-metal R binaries compiled from source and provided as a module to leveraging containers. However, there are still some version 3 variants of R still available.
As a reminder all "contrib" prefixed modules are user community created and maintained (i.e., not supported by the HYAK team).
Rstudio is an integrated development environment (IDE) for R. It's a front-end interface, historically a desktop application but it will be delivered through your browser in this instance.
Rstudio will run in a Apptainer container on a compute node then be directed through the login node back to your local computer via port forwarding.
First you need to get the Rocker Rstudio container.
- Get an interactive session (e.g.,
salloc -A uwit -p ckpt).
- Load Apptainer (i.e.,
module load apptainer).
- Pull a version of Rocker Rstudio (e.g.,
apptainer pull docker://rocker/rstudio:4.1.0).
You will need to modify a few environment variables in
rstudio-server.job related to
RSTUDIO_CWDpath, I set it to my scrubbed directory on KLONE but if you have a persitent lab folder you should use that instead. This is the folder where the container is located and downloaded to using the
apptainer pullcommand above.
- Set your
RSTUDIO_SIFvariable, this is name of the container file.
- (Optional) Set your
R_LIBS_USERpath, I set it to my scrubbed directory on KLONE as well. Note that if you have a
Rfolder in your home directory then it will supercede this other path to install R packages. Your home directory is limited and can't be expanded so you will almost certainly fill it up. The SLURM job file sets
RSTUDIO_CWDas the default folder where all
Rpackages will be installed associated with this container.
You will need to modify a few things in
rstudio-server.job related to SLURM directives. For example, fill in your specific account and partition (check your options with
hyakalloc). Also set your job run limits, cores (i.e.,
ntasks), memory, etc.
If you're successful a file named
rstudio-server.job.177885 will pop up in your home directory. The suffix matches the job number you see. Check out its contents like below for instructions on how to connect to your Rstudio session.
In a new terminal prompt on your laptop copy and paste the other SSH command from the SLURM output. You will get your 2FA prompt and after logging in the system will appear to hang. It's fine, leave this window open and it is your connection to the Rstudio session running on KLONE. If you are disconnected and reconnect you can resume your Rstudio session.
To close out the Rstudio session it will either hit the job runtime limit and self-terminate or you can (preferably) manually close it out using the
scancel command provided with the specific jobID. If this file is accidentally deleted you can always see all your running jobs with a
sacct -X command on your active KLONE login prompt to get the jobID.
The credentials are randomly generated for each
sbatch job but once you log in you should see an environment similar to that as below. Both your KLONE home directory and gscratch folders will be mounted.