R is a popular statistical programming language for data science and analysis. To use R on HYAK, we rely on Apptainer and Docker containers to deploy R. You might find a refresher on containers and modules helpful before following these instructions.

User Environment#

If you use a non-custom R container you'll likely run install.packages() at some point. Usually on a non-shared platform like your local setup (where you have full administrative privileges) R will install things into central paths. On HYAK, R package libraries are usually installed by default in the user's Home directory, which can be problematic due to the 10GB disk storage limit. If this default setting isn't changed, users can quickly run out of storage and inodes in their Home directory and need to re-configure their R environment.

Instead of waiting for the inevitable, we will direct R to install package libraries in a directory we choose where storage isn't limited. This might be your lab groups directory under /gscratch/ or a directory you creaed under you UW Net ID, like, /gscratch/scrubbed/UWNetID. Click here to review storage on HYAK.


Remember to replace the word UWNetID in the paths below with YOUR username/UWNetID.

Specify user library paths by editing or creating a configuration file called .Renviron in your Home directory. Use nano or vim to designate the location of your R package libraries. The contents of the file should be something like the following example.

$ cat ~/.Renviron
pro tip: directories don't exist until you create them

Remember if the directory you want to use doesn't exist yet, R will send an error message. If you want to create a directory for yourself in /gscratch/scrubbed use the following command:

mkdir /gscratch/scrubbed/UWNetID/
# remember to replace the word `UWNetID` above with YOUR username/UWNetID

And then create a directory to store your R package libraries called R:

mkdir /gscratch/scrubbed/UWNetID/R
# remember to replace the word `UWNetID` above with YOUR username/UWNetID

Now R will install packages in your designated directory instead of your Home directory, and you will avoid disk storage management issues later on.


If you plan on using multiple R versions you will want to set R_LIBS appropriately with each different container (i.e., R version) used so packages compiled against one version of R don't conflict with another. Using sub-folders with names matching that version of R is sufficient.

Containers from Rocker#

The Rocker Project on Docker hub hosts many containers that were prepared by the developers of R and many include various package collections. The Rocker Project on Docker hub hosts many containers that were prepared by the developers of R (https://hub.docker.com/u/rocker). In this part of the guide, we will walk you through a few of the options and show you how to set them up for your usage on klone.

R-base Container#

Let's say we wanted to use the most up-to-date version of base R from the Rocker Project on Docker hub [More information here.]. There are many other versions are R available on Docker hub, and we encourage you to explore them to find the version that fits the needs of your research project. Explore versions here.

First start an interactive job on a compute node. Building containers is not a login-node approved activity. The following command will request a single CPU on the ckpt parition with 16GB of RAM for 2 hours. If your lab group owns HYAK resources, you might be able to change --partition=ckpt to --partition=compute for priority access to a node. Find out which resources you can use with the hyakalloc command.

salloc --partition=ckpt --cpus-per-task=1 --mem=16G --time=2:00:00

Pull the container from Docker hub with Apptainer.

The command will take a minute and create the SIF file in the directory where the apptainer command was executed (the current directory). List your directory to see the .sif file. If you pulled a specific version of R-base, your image will have a different name than that shown here.

ls -alh
474M r-base_latest.sif

You can run the R binary within the container like below.

apptainer run r-base_latest.sif R
R version 4.4.0 (DATE) -- "Some Cute Name - Typical R Stuff"
Copyright (C) YEAR The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(tidyverse)
Error in library(tidyverse) : there is no package called β€˜tidyverse’

Note this R-base container has no packages except the R base packages. You can run install.packages() as you normally would if you were working with R locally and it will install all the files to whatever path you set R_LIBS to in the user environment instructions.

Tidyverse Container#

The most popular library for R is Tidyverse (More information here), which includes packages like ggplot2, dplyr, and others. As you can see in the previous section, it doesn't exist if we use the r-base Rocker container.

Your options are to:

  1. run install.packages("tidyverse") in the R-base container (r-base_latest.sif; as shown above) or
  2. use the Rocker tidyverse container with it pre-installed.

Option 1, while ok, uses a lot (and I mean a lot) of inodes as well as taking a long time to compile. It's much leaner on the cluster and faster to use a pre-built container if you know you'll use the Tidyverse.

Prior instructions on R user environment above apply. This container will also use the directory you designative in your ~/.Renviron config file. Once downloaded (the Docker to Apptainer conversion will take a few minutes), it will create a separate SIF file as shown below.

# remember to do this on a compute node
# start an interactive job with the following if you haven't yet
salloc --partition=ckpt --cpus-per-task=1 --mem=16G --time=2:00:00
ls -alh
675M tidyverse_latest.sif

Now when you run this container's R binary you can successfully load the Tidyverse.

apptainer run tidyverse_latest.sif R
R version 4.4.0 (DATE) -- "Some Cute Name - Typical R Stuff"
Copyright (C) YEAR The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
βœ” ggplot2 3.3.2 βœ” purrr 0.3.4
βœ” tibble 3.0.1 βœ” dplyr 1.0.0
βœ” tidyr 1.1.0 βœ” stringr 1.4.0
βœ” readr 1.3.1 βœ” forcats 0.5.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag() masks stats::lag()
Warning messages:
1: replacing previous import β€˜vctrs::data_frame’ by β€˜tibble::data_frame’ when loading β€˜dplyr’
2: package β€˜purrr’ was built under R version 4.4.0

Success! Get on with making your pretty plots, you container superstar!

Rstudio Container and Graphical User Interface#

Rstudio is an integrated development environment (IDE) for R. It's a front-end interface, historically a desktop application but it will be delivered through your browser in this instance. Rstudio will run in an Apptainer container on a compute node then be directed through the login node back to your local computer via port forwarding. In this way, you can use Rstudio on klone.

Step 1: Download Rstudio Container#

First, you need the Rocker Rstudio container.

# remember to do this on a compute node
# start an interactive job with the following if you haven't yet
salloc --partition=ckpt --cpus-per-task=1 --mem=16G --time=2:00:00
# Pull the latest version of Rocker Rstudio (or the version of your choice)
# with apptainer
apptainer pull docker://rocker/rstudio

The following will prepare a .sif file called rstudio_latest.sif, but it might have another name if you pulled a different version.

Step 2: Prepare SLURM Job File#

We will launch the container as a job with the command sbatch, which requests job from our job scheduler sftware called SLURM. Download our SLURM job file from this hyperlink which was adopted for KLONE from the tutorial by Rocker More information about the original tutorial can be found here.. The command below will download the file to your current directory.

wget https://hyak.uw.edu/files/rstudio-server.job

Remember to replace the word UWNetID in the paths below with YOUR username/UWNetID.

You will need to modify a few environment variables in rstudio-server.job related to R. Use nano or vim to edit the contents of rstudio-server.job:

  1. The RSTUDIO_CWD path, is your working directory, as if you were using the function setwd() within R. rstudio-server.job shows this as /gscratch/scrubbed/UWNetID You must change this line for this to work. We recommend setting this to the directory where you are storing your data for your intended project. Additionally, it might simplify matters if this is the folder where the container is located and downloaded to using the apptainer pull command above.
  2. Set your RSTUDIO_SIF variable, this is name of the container file. In this case, rstudio_latest.sif.
  3. (Optional) Set your R_LIBS_USER path, which in rstudio-server.job is R_LIBS_USER=${RSTUDIO_CWD}/R or /gscratch/scrubbed/UWNetID/R because RSTUDIO_CWD="/gscratch/scrubbed/UWNetID", remember? Change these variables to fit your needs. That means for this Rstudio session my package libraries (when I use install.packages()) will be stored in /gscratch/scrubbed/UWNetID/R. In this case, I am matching this Rstudio session to my preferences set above in the user environment section. For your session, you might decide to designate a different directory for your R package libraries. Rememeber directories don't exist until you make them.

Additionally, you might decide to modify the sbatch directives to adjust the resources to request for your SLURM Rstudio job. For example, fill in your specific partition if applicable (check your options with hyakalloc). Also set your job run limits, cores (i.e., ntasks), memory, etc.

Review the highlighted sections of rstudio-server.job below and edit your version to fit your needs and paths you have access to:

#SBATCH --job-name=rstudio-server
#SBATCH --partition=ckpt #update this line - use hyakalloc to find partitions you can use
#SBATCH --time=02:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --mem=20G
#SBATCH --signal=USR2
#SBATCH --output=%x_%j.out
# This script will request a single CPU with four threads with 20GB of RAM for 2 hours.
# You can adjust --time, --nodes, --ntasks, and --mem above to adjust these settings for your session.
# --output=%x_%j.out creates a output file called rstudio-server_XXXXXXXX.out
# where the %x is short hand for --job-name above and the X's are an 8-digit
# jobID assigned by SLURM when our job is submitted.
RSTUDIO_CWD="/gscratch/scrubbed/UWNetID" # UPDATE THIS LINE
RSTUDIO_SIF="rstudio_latest.sif" # update this line
Step 3: Start the Rstudio Server#

Next's we'll submit the job with sbatch which will launch the Rstudio container, and then we will use port forwading to interact with the RStudio interface on our web browser.

sbatch rstudio-server.job
Submitted batch job 12345678
# SLURM will assign a JobID when the job was submmitted
# it will likely be an 8-digit number, but not 12345678
Pro Tip

Monitor the job with squeue and your UWNetID like the following example:

squeue -u UWNetID
12345678 ckpt rstudio UWNetID R 3:15 1 n3088

SLURM will save your output file called rstudio-server_12345678.out in the directory where the sbatch command was executed. The suffix matches the job number you see. Check out its contents like below for instructions on how to connect to your Rstudio session.

cat rstudio-server_12345678.out
1. SSH tunnel from your workstation using the following command:
ssh -N -L 8787:n3164:47101 UWNetID@klone.hyak.uw.edu
and point your web browser to http://localhost:8787
2. log in to RStudio Server using the following credentials:
user: UWNetID
password: 410lzxMwV9EObv7aDEjm
When done using RStudio Server, terminate the job by:
1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:
scancel -f 12345678

The credentials are randomly generated for each sbatch job adding additional cybersecurity with a new session password each time you launch Rstudio this way.

Step 4: Start Port Forwarding#


This next section is done on your local computer not on the cluster.

In a new terminal or command prompt on your laptop copy and paste the other SSH command from the SLURM output. The following is an example:

ssh -N -L 8787:n3164:47101 UWNetID@klone.hyak.uw.edu
... provide UWNetID password
... Duo 2 Factor Authentication

The login will appear to hang, but your connection is now open. If you are disconnected and reconnect you can resume your Rstudio session.


Do not use the rstudio-server password to open the ssh tunnel. After your ssh command, your UWNetID password is required. Multiple failed login attempts will result in a IP ban.

Next, open a new browser window to http://localhost:8787 and provide the password from the output file (rstudio-server_12345678.out and 410lzxMwV9EObv7aDEjm in this example).

Once you log in you should see an environment similar to the below. Both your Home directory and gscratch folders will be mounted.


Step 5: End your Session#

If you did not adjust the --time directive in rstudio-server.job, your session will end after 2 hours.

Preferably, you can end your session manually. Exit the RStudio Session ("power" button in the top right corner of the RStudio window). Then go back to klone and use the scancel command provided with the specific jobID. For example,

scancel -f 12345678

Regular use of this method#

Once you are satisfied with the job settings and configuration of your Rstudio session, you can reuse this method everytime you want to use Rstudio by starting at Step 3: Start the Rstudio Server above.

If you have trouble with this method, please report errors in an email to help@uw.edu with HYAK in the message.

R via Modules#

There are some versions of R still available as modules. Use these at your own risk. They may be versions with deprecated packages, and many were contributed by other users who built them to fit their personal needs, not yours. The HYAK team will not provide support for the use of these modules.

module avail
----- /sw/modules-1.775/modulefiles -----
r_3.3.3 r_3.5.1 r_3.6.0 r_3.6.0+Rmpi-impi_2019
----- /sw/modules-1.775/modulefiles -----
contrib/r/3.4.3 contrib/r/3.5.1 contrib/r/3.6.1