Using Ollama on Hyak

Conventional LLM tools require root access for installation on Hyak. To maintain system security and stability, users do not have root or sudo access. Administrative privileges, including external program installations, are reserved for system administrators. To work around this, LLMs can be used via software containers.

What Are Ollama LLMs?

Ollama LLMs are large language models (LLMs) developed by Ollama. LLMs are artificial intelligence systems that understand human language. Ollama LLMs can run locally on your device and do not require constant internet connection to cloud-based servers that other LLMs may require. Because they generally require root access for installation on Hyak, it is reccommended that Ollama LLMs are used through NVIDIA containers. To get started with Ollama on Hyak, you will need to be accustomed with Apptainer and requesting GPU jobs.

Installing Ollama as a Container

You can install Ollama in a container definition file. This example will use the NVIDIA HPC SDK container. The NVIDIA HPC SDK container has Nvidia and Cuda dirvers. Create the definition file with vim or nano:

nano ollama.def

Bootstrap: docker
From: nvcr.io/nvidia/nvhpc:24.9-devel-cuda_multi-rockylinux8

%post
    # Ollama install
    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
    tar -C /usr -xzf ollama-linux-amd64.tgz

Customizing Your Container

In this example, the v.24.9 with RockyLinux8 NVHPC container is used. You may use other versions of NVHPC containers by modifying the From: line in your definition file. To view other available versions of NVHPC containers, click HERE.

Next, you will want to pull the NVIDIA base container and install Ollama inside of it. You can build the container interactively or as a job submission:

# interactively from checkpoint
salloc --partition=ckpt-all --cpus-per-task=2 --mem=50G --time=8:00:00

apptainer build ollama.sif ollama.def

This container may take some time to build. To save time, you can copy a prebuilt ollama.sif file in your current directory using the following command:

cp /mmfs1/sw/ollama/ollama.sif .

Requesting Resources for Larger LLMs

Depending on the size of the model you wish to run, you may want to request more resources. You can request all available memory with --mem=0. When requesting multiple GPUs, LLMs may run into issues distributing their memory usage across the GPUs. If the model is configured properly, this should not be an issue. You can request up to 8 GPUs (an entire GPU server). Efficiency will drop when requesting more than 8 GPUs because the GPU cards will be located on different nodes.

Note that the more resources you request may increase the wait times to get your requested resources. It can be useful to convert the salloc flags above into #SBATCH directives in a executable bash (sbatch) script along with the commands you want Ollama to execute when anticipated wait times are long. Additional information on requesting a GPU job can be found HERE.

To ensure the container was properly built, start an interactive shell session:

apptainer shell --nv --bind /gscratch/ ollama.sif

The --nv flag enables GPU support by binding the necessary NVIDIA libraries from the host system. The --bind /gscratch flag allows containers to access files on the filesystem outside the container. The Apptainer > prompt should now appear on the command line, indicating that you have sucessfully enerted the container shell. Because you have binded the filesystem, you can change to the containers root directory to find where Ollama was installed with cd /. You should be able to see an Ollama directory under /usr. Note that the /usr directory will also contain files and directories from the host kernal. You can now run Ollama as a background job with the following commands:

# start the ollama server in the background
ollama serve &
# pulling an ollama model
ollama pull llama3.2
# run the pulled model
ollama run llama3.2

Managing Your Ollama Storage

By default, pulled Ollama models will save in your home directory in a hidden file named .ollama. Because your home directory has a 10GB limit, you may get a disk quota error when pulling larger models. Use the following commands to check the storage in your home directory and to list all hidden files:

cd ~ # changing to your home directory
ls -a # lists all hidden files
du -h --max-depth 1 # checks your storage

If the .ollama directory is large enough, you may run into a disk quota error. You will need to clear out this directory by removing the models directory and creating a new default directory for ollama storage.

cd .ollama
rm -rf models

It may also be useful to clear out your Apptainer cache:

apptainer cache clear

Next, create a new directory in a location with a larger storage quota (i.e. storage space for your lab):

cd ~
# example path
cd /gscratch/lab-name/my-directory
mkdir ollama
cd ollama
mkdir models # this will be your new storage directory

Next, go back to your home directory and set up a symbolic link to the new models directory you created:

cd ~
cd .ollama
ln -s /gscratch/lab-name/my-directory/ollama/models models
ls -s

You should see models highlighted in light blue with an arrow pointing to the path to the new models directory you created. New ollama models will save here instead of .ollama/models so your home directory stays under the 10GB limit.

What Are Ollama LLMs?​

Installing Ollama as a Container​

What Are Ollama LLMs?

Installing Ollama as a Container