R and Rstudio
R is a popular statistical programming language for data science and analysis. To use R on Hyak, we rely on Apptainer and Docker containers to deploy R. You might find a refresher on containers and modules helpful before following these instructions.
#
User EnvironmentIf you use a non-custom R container you'll likely run install.packages()
at some point. Usually on a non-shared platform like your local setup (where you have full administrative privileges) R will install things into central paths. On Hyak, R package libraries are usually installed by default in the user's Home directory, which can be problematic due to the 10GB disk storage limit. If this default setting isn't changed, users can quickly run out of storage and inodes in their Home directory and need to re-configure their R environment.
Instead of waiting for the inevitable, we will direct R to install package libraries in a directory we choose where storage isn't limited. This might be your lab groups directory under /gscratch/
or a directory you creaed under you UW Net ID, like, /gscratch/scrubbed/UWNetID
. Click here to review storage on Hyak.
important
Remember to replace the word UWNetID
in the paths below with YOUR username/UWNetID.
Specify user library paths by editing or creating a configuration file called .Renviron
in your Home directory. Use nano
or vim
to designate the location of your R package libraries. The contents of the file should be something like the following example.
pro tip: directories don't exist until you create them
Remember if the directory you want to use doesn't exist yet, R will send an error message. If you want to create a directory for yourself in /gscratch/scrubbed
use the following command:
And then create a directory to store your R package libraries called R
:
Now R will install packages in your designated directory instead of your Home directory, and you will avoid disk storage management issues later on.
caution
If you plan on using multiple R versions you will want to set R_LIBS
appropriately with each different container (i.e., R version) used so packages compiled against one version of R don't conflict with another. Using sub-folders with names matching that version of R is sufficient.
#
Containers from RockerThe Rocker Project on Docker hub hosts many containers that were prepared by the developers of R and many include various package collections. The Rocker Project on Docker hub hosts many containers that were prepared by the developers of R
(https://hub.docker.com/u/rocker). In this part of the guide, we will walk you through a few of the options and show you how to set them up for your usage on klone
.
#
R-base ContainerLet's say we wanted to use the most up-to-date version of base R from the Rocker Project on Docker hub [More information here.]. There are many other versions are R available on Docker hub, and we encourage you to explore them to find the version that fits the needs of your research project. Explore versions here.
First start an interactive job on a compute node. Building containers is not a login-node approved activity. The following command will request a single CPU on the ckpt
parition with 16GB of RAM for 2 hours. If your lab group owns Hyak resources, you might be able to change --partition=ckpt
to --partition=compute
for priority access to a node. Find out which resources you can use with the hyakalloc
command.
Pull the container from Docker hub with Apptainer.
The command will take a minute and create the SIF file in the directory where the apptainer command was executed (the current directory). List your directory to see the .sif
file. If you pulled a specific version of R-base, your image will have a different name than that shown here.
You can run the R binary within the container like below.
Note this R-base container has no packages except the R base packages. You can run install.packages()
as you normally would if you were working with R locally and it will install all the files to whatever path you set R_LIBS
to in the user environment instructions.
#
Tidyverse ContainerThe most popular library for R is Tidyverse (More information here), which includes packages like ggplot2
, dplyr
, and others. As you can see in the previous section, it doesn't exist if we use the r-base
Rocker container.
Your options are to:
- run
install.packages("tidyverse")
in the R-base container (r-base_latest.sif
; as shown above) or - use the Rocker
tidyverse
container with it pre-installed.
Option 1, while ok, uses a lot (and I mean a lot) of inodes as well as taking a long time to compile. It's much leaner on the cluster and faster to use a pre-built container if you know you'll use the Tidyverse.
Prior instructions on R user environment above apply. This container will also use the directory you designative in your ~/.Renviron
config file. Once downloaded (the Docker to Apptainer conversion will take a few minutes), it will create a separate SIF file as shown below.
Now when you run this container's R binary you can successfully load the Tidyverse.
Success! Get on with making your pretty plots, you container superstar!
#
Rstudio Container and Graphical User InterfaceRstudio is an integrated development environment (IDE) for R. It's a front-end interface, historically a desktop application but it will be delivered through your browser in this instance. Rstudio will run in an Apptainer container on a compute node then be directed through the login node back to your local computer via port forwarding. In this way, you can use Rstudio on klone
.
#
Step 1: Download Rstudio ContainerFirst, you need the Rocker Rstudio container.
The following will prepare a .sif
file called rstudio_latest.sif
, but it might have another name if you pulled a different version.
#
Step 2: Prepare Slurm Job FileWe will launch the container as a job with the command sbatch
, which requests job from our job scheduler sftware called Slurm. Download our Slurm job file from this hyperlink which was adopted for klone
from the tutorial by Rocker More information about the original tutorial can be found here.. The command below will download the file to your current directory.
important
Remember to replace the word UWNetID
in the paths below with YOUR username/UWNetID.
You will need to modify a few environment variables in rstudio-server.job
related to R
. Use nano
or vim
to edit the contents of rstudio-server.job
:
- The
RSTUDIO_CWD
path, is your working directory, as if you were using the functionsetwd()
withinR
.rstudio-server.job
shows this as/gscratch/scrubbed/UWNetID
You must change this line for this to work. We recommend setting this to the directory where you are storing your data for your intended project. Additionally, it might simplify matters if this is the folder where the container is located and downloaded to using theapptainer pull
command above. - Set your
RSTUDIO_SIF
variable, this is name of the container file. In this case,rstudio_latest.sif
. - (Optional) Set your
R_LIBS_USER
path, which inrstudio-server.job
isR_LIBS_USER=${RSTUDIO_CWD}/R
or/gscratch/scrubbed/UWNetID/R
becauseRSTUDIO_CWD="/gscratch/scrubbed/UWNetID"
, remember? Change these variables to fit your needs. That means for this Rstudio session my package libraries (when I useinstall.packages()
) will be stored in/gscratch/scrubbed/UWNetID/R
. In this case, I am matching this Rstudio session to my preferences set above in the user environment section. For your session, you might decide to designate a different directory for your R package libraries. Rememeber directories don't exist until you make them.
Additionally, you might decide to modify the sbatch
directives to adjust the resources to request for your Slurm Rstudio job. For example, fill in your specific partition if applicable (check your options with hyakalloc
). Also set your job run limits, cores (i.e., ntasks
), memory, etc.
Review the highlighted sections of rstudio-server.job
below and edit your version to fit your needs and paths you have access to:
#
Step 3: Start the Rstudio ServerNext's we'll submit the job with sbatch
which will launch the Rstudio container, and then we will use port forwading to interact with the RStudio interface on our web browser.
Pro Tip
Monitor the job with squeue
and your UWNetID
like the following example:
Slurm will save your output file called rstudio-server_12345678.out
in the directory where the sbatch
command was executed. The suffix matches the job number you see. Check out its contents like below for instructions on how to connect to your Rstudio session.
The credentials are randomly generated for each sbatch
job adding additional cybersecurity with a new session password each time you launch Rstudio this way.
#
Step 4: Start Port Forwardingimportant
This next section is done on your local computer not on the cluster.
In a new terminal or command prompt on your laptop copy and paste the other SSH command from the Slurm output. The following is an example:
The login will appear to hang, but your connection is now open. If you are disconnected and reconnect you can resume your Rstudio session.
warning
Do not use the rstudio-server password to open the ssh tunnel. After your ssh command, your UWNetID password is required. Multiple failed login attempts will result in a IP ban.
Next, open a new browser window to http://localhost:8787 and provide the password from the output file (rstudio-server_12345678.out
and 410lzxMwV9EObv7aDEjm
in this example).
Once you log in you should see an environment similar to the below. Both your Home directory and gscratch folders will be mounted.
#
Step 5: End your SessionIf you did not adjust the --time
directive in rstudio-server.job
, your session will end after 2 hours.
Preferably, you can end your session manually. Exit the RStudio Session ("power" button in the top right corner of the RStudio window). Then go back to klone
and use the scancel
command provided with the specific jobID. For example,
#
Regular use of this methodOnce you are satisfied with the job settings and configuration of your Rstudio session, you can reuse this method everytime you want to use Rstudio by starting at Step 3: Start the Rstudio Server above.
If you have trouble with this method, please report errors in an email to help@uw.edu with Hyak in the message.
#
R via ModulesThere are some versions of R still available as modules. Use these at your own risk. They may be versions with deprecated packages, and many were contributed by other users who built them to fit their personal needs, not yours. The Hyak team will not provide support for the use of these modules.