This final section continues right where we left off: with Jupyter. We had just finished going through our Jupyter server launch script, so now we will start a job using our container and that launcher.
As with the container-building SBATCH script, this one is available under
it to your home directory, and we'll go through it.
The only part of this you shouldn't change is the job name, since we automated getting the node hostname
for a job called
klone-container. Everything else is up to you: the number of CPUs, the memory, the partition,
and the time limit.
Here are our scripts in action: start the container (with the read-write version of the container launcher),
and have that container run our Jupyter launcher. Once the job has started, watch the job's output file for
changes. It'll be named
jupyter-server-12345678.out, but with your job's ID, so we can check it out with:
Once you see that informational message, the Jupyter server is running and we're ready for the last step: the port forwarding.
I swear, this is the last one. You can download it here, and then we'll walk through it:
We start by saving the output of a remote command that we run on the login node.
The file with the Jupyter information,
~/.jupyter-port-and-token, will look something like this:
That line is what ends up in our
JUPYTER_INFO variable, i.e.
If the variable is empty (which we check with
-z), exit after printing an error message.
Here's a bit of advanced Bash:
JUPYTER_PORT will strip a space (and everything after it) from
JUPYTER_INFO, leaving us with just the port.
JUPYTER_TOKEN is the opposite: it will strip a space (and everything before it) from
us with just the token.
If you want to learn more about string manipulation in Bash, check out The Linux Documentation Project's Advanced Bash Guide.
Let's break down the
ssh command at the top:
-N: this tells SSH not to run anything remotely, since we're just forwarding a port.
-L: this tells SSH that we're forwarding a port.
8888:localhost:$JUPYTER_PORT: connect port
34567) on the remote host.
klone-node: using our SSH configurations from earlier, the remote host is the compute node running our job.
&: put this process in the background.
Before we move on, we save the process ID for SSH (the last process ID is saved in
and we make sure it didn't exit with an error (the last exit code is saved in
This part is easy: print out the web address we'll use to connect to our Jupyter server,
and print out the
kill command you can use to close the SSH forwarding when you're done.
All we have to do now is make sure the node is correct, start the forwarding, and open a browser:
Copy and paste the web address into your browser, and you should be connected to your Jupyter server.
When you're finished, you can use the kill command we generated to ensure your port forwarding is stopped:
All the difficult work is behind us. If we want to use our container interactively, we'll just use all the shortcuts we created.
First, we'll request an interactive job in the checkpoint partition, with a single CPU and 16GB of memory.
The most important part, if you're going to connect directly to the node, is that you need to name the job with
--job-name=klone-container so that our node-finding script works properly.
We automated this step, too. Now we're in our container, attached to a read-write overlay filesystem.
And that's all there is to it. Before we move on to non-interactive jobs, here's the background on Slurm compatibility:
What's required for Slurm?
Running Slurm in any container requires the following:
- The same version of Slurm running on the node (which we installed from the Hyak repository).
- The same user ID and group ID for the Slurm user as on the node (which we copied during the container build).
- Three bind-mounts to node filesystems, all of which are included in the compute node's default Apptainer configuration:
Running non-interactive jobs is a little more complex, since we'll need to pass a script to our container.
Let's say you've written a bit of code that uses one of the conda environments in your overlay: we'll call
~/do-some-research.py. We'll start by writing a Bash script to get into the conda environment & run the script:
Don't forget to make this script executable:
Now we'll make an SBATCH script, where we pass this script to our container:
This will start a job named 'research' with 8 CPUs, 64GB of RAM, and a time limit
of 8 hours. Don't forget to change the account or parition.
This tells our container (with our conda overlay in read-only) to run
~/start-research.sh wrapper for our
~/do-some-research.py Python script.
All that's left is to submit the job with
sbatch ~/research.job and wait for the results.
This final section is broken up into two parts. The first part is trivial: using VSCode's Remote-SSH extension to connect to an interactive job & edit code. The second part isn't so trivial: we're going to use an undocumented VSCode feature to SSH directly into our container on the node so that VSCode extensions can, for instance, run Jupyter Notebooks from our conda environment.
Let's start with the easy part.
All we need to do here is add a single step to our interactive job setup.
Once again, we're requesting an interactive job with 1 CPU and 16GB of memory. Just have to wait for it to be allocated.
Once we have the job, on our local machine we'll make sure our SSH configuration has the correct node:
klone-node SSH target ready to go, we'll first use the Remote-SSH extension to connect to a host:
Then we'll enter
And, after VSCode finishes installing the remote extensions, we should see that we're connected:
That's it. You can now open remote folders like your home directory, or your group's
gscratch directories, and
use VSCode as you usually do.
This is the tricky part. If we want VSCode itself to be able to run anything in our container's conda environments, we'll have to connect directly to the container. First, we need to add a couple new SSH targets.
We're going to modify our local SSH configurations again, starting with the main config at
These new shortcuts will allow us to connect (through klone-login) to our container, either with
ssh klone-container-rw or
ssh klone-container-ro depending on whether we need to make changes in our overlay.
As mentioned before, the
%r in the
RemoteCommand line is an SSH config abbreviation for the remote username—i.e. your UW Net ID—so no need to change it here.
What this is telling SSH is that, when we run
ssh klone-container-ro, the first thing it should do before allowing us to interact is run our
klone-node-config needs to apply to both
klone-container shortcuts, so don't forget to update
Huge caveat: this VSCode feature, as far as we can tell, is undocumented. If this doesn't work, it may be too difficult to
be worth troubleshooting. With that disclaimer out of the way, let's modify our VSCode's
Once you have it open, you need to add the following:
Assuming you still have your
klone-container job running, and your
the correct compute node for the job, we can connect directly to the container. Select the
Connect to Host... option
in the Remote-SSH extension again:
But this time, connect to
It may take a few moments to connect, but once you're in it will look quite similar to when
we connected directly to the node.
Since we're already in the container, we can interact with
conda right away:
In this demonstration, we're going to run the the VSCode Jupyter extension, which means we'll have to install the extensions on klone. When you browse the 'Extensions' tab, you should see an option to "Install in SSH: klone-container-rw":
Install both the Jupyter and the Python extensions before continuing.
Once the extensions are installed, open up your home directory in VSCode and we'll try to make a new Jupyter Notebook:
The first time you run the VSCode Jupyter plugins, it'll ask you where you want to run Jupyter. In this case, we're going to select "Default", because it will run Jupyter "locally" (in the container):
It will prompt you to reload the window, which you should do, and finally we'll make sure we're using the right Python:
You'll see a list, probably something similar to this, and you can select whichever conda environment's Python you want:
And with all of that configuration in place, we should be able to test it out with something simple, like this:
Making this work is advanced, requiring a large stack of interdependent pieces: we're connecting to directly to our container, on a compute node, running an interactive job, with our overlay in read-write mode, and installing & running VSCode extensions by using an undocumented, hidden feature of the Remote-SSH extension.
There are quite a few things that can go wrong, and it's tough to troubleshoot when they do. For most users, we recommend using JupyterLab or Jupyter Notebook in your browser, using the interactive job from before.