Hands-on HPC with Julia: a short introduction

Last updated: January 12, 2023

Interactive sessions (Jupyter or salloc) vs sbatch

Limitations of Jupyter

Jupyter is a fantastic tool. It has a major downside however: when you launch a Jupyter session, you are running a job on a compute node. If you want to play for 8 hours in Jupyter, you are requesting an 8 hour job. Now, most of the time you spend on Jupyter is spent typing, running bits and pieces of code, or doing nothing at all. If you ask for GPUs, many CPUs, and lots of RAM, all of it will remain idle most of the time. This is a suboptimal use of resources.

In addition, if you ask for lots of resources for a long time, you will have to wait for a while before they get allocated to you.

Lastly, you will go through your allocations quickly.

All of this applies equally for interactive sessions launched with salloc.

A good approach

A pretty good strategy is to develop and test your code with small samples, few iterations, etc. in an interactive job (from an SSH session in the cluster with salloc), on your own computer (if appropriate), or in Jupyter, then, launch an sbatch job from an SSH session in the cluster to run the full code. This ensures that heavy duty resources such as GPU(s) are only allocated to you when you are actually needing and using them.

This is exactly what we are doing during this workshop.

Accessing Julia in an SSH session

Log in the training cluster

First, SSH into our training cluster UU. This takes you to the login node of the cluster.

Load necessary modules

This is done with the Lmod tool through the module command (you can find the full documentation here).

Below are some key Lmod commands:

# Get help on the module command
$ module help

# List modules that are already loaded
$ module list

# See which modules are available for Julia
$ module spider julia

# See how to load the latest version (julia/1.5.2)
$ module spider julia/1.5.2

# Load julia/1.5.2 with the required StdEnv/2020 module first
# (the order is important)
$ module load StdEnv/2020 julia/1.5.2

# You can see that we now have Julia loaded
$ module list

Install Julia packages

Your turn:
Install the package BenchmarkTools

After loading the proper modules, launch Julia:

$ julia

Within Julia, type ]

(you will see your prompt change from julia> to pkg>)

then run:

add BenchmarkTools

Running jobs on the cluster

You should never run computing tasks on the login node. You need to submit a job to Slurm (the job scheduler used by the Compute Canada clusters) to access the computing nodes of the cluster.

Interactive jobs

To launch an interactive session on a compute node, use the salloc command.

Example:

$ salloc -t 10 -c 8 --mem=2G

This will send you on a compute node for 10 minutes. On that node, you will have access to 8 CPU and 2G of RAM. This is a place where you can launch julia and run some computations. At the end of the 10 min, or if you cancel the job with Ctrl-D , you will get back to the login node.

As mentioned earlier, interactive jobs have the same drawback as Jupyter. To limit your resource allocations to what you really need, you want to submit jobs to Slurm with sbatch.

Job scripts

To submit a batch job to Slurm, first, you need a Julia script to run.

Write a Julia script

Create a directory for your project in ~/ and cd into it:

$ mkdir ~/julia_project
$ cd ~/julia_project

Write a Julia script with the text editor of your choice:

$ nano my_julia_script.jl

Write an sbatch script

Then you need to write a shell script for sbatch :

$ nano script.sh

The script may look something like this:

#!/bin/bash
#SBATCH --job-name=<name>			# job name
#SBATCH --time=<time>				# max walltime
#SBATCH --nodes=<N>			        # number of nodes
#SBATCH --cpus-per-task=<n>         # number of cores on each node
#SBATCH --mem=<mem>					# max memory (default unit is megabytes)
#SBATCH --output=%j.out				# file name for the output
#SBATCH --error=%j.err				# file name for errors

julia my_julia_script.jl

Notes:

--time accepts these formats: "min", "min:s", "h:min:s", "d-h", "d-h:min" & "d-h:min:s"
%j gets replaced with the job number

To submit a job to the cluster:

$ cd /dir/containing/script
$ sbatch script.sh

And we can check its status with:

$ sq

PD = pending
R = running
CG = completing (Slurm is doing the closing processes)
No information = your job has finished running

You can cancel it with:

$ scancel <jobid>

Once your job has finished running, you can display efficiency measures with:

$ seff <jobid>

Parallel computing

The whole point of running your Julia script on the cluster is to take advantage of its large computing power to improve the time required for it to run. There are hardware-independent techniques to optimize your Julia code. After that, the key to improve performance is code parallelization through shared memory, distributed memory, and the use of GPUs.

Shared memory (aka multi-threading)

Launching Julia on multiple threads

Starting with Julia 1.5, you can launch Julia on n threads with:

$ julia -t n

For example, to launch Julia on 4 threads, you can run:

$ julia -t 4

For earlier versions, you need to set the JULIA_NUM_THREADS environment variable:

$ export JULIA_NUM_THREADS=n
$ julia

Or you can launch Julia with:

$ JULIA_NUM_THREADS=n julia

For example, to launch Julia on 4 threads, you can run:

$ JULIA_NUM_THREADS=4 julia

Using multiple threads

Multi-threading is supported by the Base.Threads module.

Threads.nthreads() outputs the number of threads Julia is using and Threads.threadid() outputs the ID of the current thread.

The Threads.@threads macro allows to run for loops on multiple threads extremely easily.

Example:

Threads.@threads for i = 1:10
    println("i = $i on thread $(Threads.threadid())")
end

The Threads.@spawn macro allows multi-threading outside the context of loops. This feature is currently experimental and little documented, but an example is given in this blog post.

Distributed computing

Launching several Julia processes

Julia supports distributed computing thanks to the module Distributed .

There are two ways to launch several Julia processes (called "workers"):

Launch Julia on n workers

Julia can be started with the -p flag followed by the number of workers by running:

$ julia -p n

This launches n workers, available for parallel computations, in addition to the process running the interactive prompt, so there are n + 1 Julia processes in total.

Example to start 4 worker processes:

$ julia -p 4

Launching Julia with the -p flag automatically loads the Distributed module.

Start workers from within a Julia session

Alternatively, workers can be started from within a Julia session. In this case, you need to load the module Distributed explicitly:

using Distributed

To launch n workers:

addprocs(n)

Example to add 4 worker processes to a running Julia session:

addprocs(4)

Managing workers

In Julia, you can see how many workers are running with:

nworkers()

The total number of processes (n + 1 ) can be returned with:

nprocs()

You can list all the worker process identifiers with:

workers()

The process running the Julia prompt has id 1 .

To kill a worker:

rmprocs(<pid>)

where <pid> is the process identifier of the worker you want to kill (you can kill several workers by providing a list of pids).

Using workers

There are a number of convenient macros:

@everywhere

The following expression gets executed on all processes.

For instance, if your parallel code requires a module or an external package to run, you need to load that module or package with @everywhere:

@everywhere using DataFrames

If the parallel code requires a script to run:

@everywhere include("script.jl")

If it requires a function that you are defining, you need to define it on all the workers:

@everywhere function <name>(<arguments>)
    <body>
end

@spawnat

Assigns a task to a particular worker.

The first argument indicates the process id, the second argument is the expression that should be evaluated:

@spawnat <pid> <expression>

@spawnat returns a Future : the placeholder for a computation of unknown status and time. The function fetch waits for a Future to complete and returns the result of the computation.

Example:

The function myid gives the id of the current process. As I mentioned earlier, the process running the interactive Julia prompt has the pid 1 . So myid() normally returns 1.

But we can "spawn" myid on one of the worker, for instance the first worker (so pid 2 ):

@spawnat 2 myid()

This returns a Future , but if we pass it through fetch, we get the result of myid ran on the worker with pid 2 :

fetch(@spawnat 2 myid())

If you want tasks to be assigned to any worker automatically, you can pass the symbol :any to @spawnat instead of the worker id:

@spawnat :any myid()

And to get the result:

fetch(@spawnat :any myid())

If you run this multiple times, you will see that myid is run on any of your available workers. This will however never return 1, except when you only have one running Julia process (in that case, the process running the prompt is considered a worker).

Data too large to fit in the memory of a single node

In the case of extremely large data which cannot fit in memory on a single node, the DistributedArrays package allows to distribute large arrays across multiple nodes.

GPUs

Julia has GPU support through a number of packages. We will offer workshops and webinars on running Julia on GPUs in 2021.

Interactive sessions (Jupyter or salloc) vs sbatch

Limitations of Jupyter

A good approach

Accessing Julia in an SSH session

Log in the training cluster

Load necessary modules

Install Julia packages

Running jobs on the cluster

Interactive jobs

Job scripts

Write a Julia script

Write an sbatch script

Parallel computing

Shared memory (aka multi-threading)

Launching Julia on multiple threads

Using multiple threads

Distributed computing

Launching several Julia processes

Launch Julia on n workers

Start workers from within a Julia session

Managing workers

Using workers

@everywhere

@spawnat

Data too large to fit in the memory of a single node

GPUs

Comments & questions