Interactive Mode
#
Using Conda and Slurm#
Interactive jobsAll the difficult work is behind us. If we want to use our container interactively, we'll just use all the shortcuts we created.
#
1. Start an interactive job on the login nodeFirst, we'll request an interactive job in the checkpoint partition, with a single CPU and 16GB of memory.
The most important part, if you're going to connect directly to the node, is that you need to name the job with
--job-name=klone-container
so that our node-finding script works properly.
#
2. Get into our containerWe automated this step, too. Now we're in our container, attached to a read-write overlay filesystem.
#
3. Run Conda and Slurm commandsAnd that's all there is to it. Before we move on to non-interactive jobs, here's the background on Slurm compatibility:
What's required for Slurm?
Running Slurm in any container requires the following:
- The same version of Slurm running on the node (which we installed from the Hyak repository).
- The same user ID and group ID for the Slurm user as on the node (which we copied during the container build).
- Three bind-mounts to node filesystems, all of which are included in the compute node's default Apptainer configuration:
/var/run/munge
/var/run/slurm
/var/spool/slurmd
#
Non-interactive jobsRunning non-interactive jobs is a little more complex, since we'll need to pass a script to our container.
Let's say you've written a bit of code that uses one of the conda environments in your overlay: we'll call
it ~/do-some-research.py
. We'll start by writing a Bash script to get into the conda environment & run the script:
Don't forget to make this script executable:
Now we'll make an SBATCH script, where we pass this script to our container:
This will start a job named 'research' with 8 CPUs, 64GB of RAM, and a time limit
of 8 hours. Don't forget to change the account or parition.
This tells our container (with our conda overlay in read-only) to run
the ~/start-research.sh
wrapper for our ~/do-some-research.py
Python script.
All that's left is to submit the job with sbatch ~/research.job
and wait for the results.