Hands-on HPC with Julia: a short introduction
Interactive sessions (Jupyter or salloc) vs sbatch
Limitations of Jupyter
Jupyter is a fantastic tool. It has a major downside however: when you launch a Jupyter session, you are running a job on a compute node. If you want to play for 8 hours in Jupyter, you are requesting an 8 hour job. Now, most of the time you spend on Jupyter is spent typing, running bits and pieces of code, or doing nothing at all. If you ask for GPUs, many CPUs, and lots of RAM, all of it will remain idle most of the time. This is a suboptimal use of resources.
In addition, if you ask for lots of resources for a long time, you will have to wait for a while before they get allocated to you.
Lastly, you will go through your allocations quickly.
All of this applies equally for interactive sessions launched with salloc
.
A good approach
A pretty good strategy is to develop and test your code with small samples, few iterations, etc. in an interactive job (from an SSH session in the cluster with salloc
), on your own computer (if appropriate), or in Jupyter, then, launch an sbatch
job from an SSH session in the cluster to run the full code. This ensures that heavy duty resources such as GPU(s) are only allocated to you when you are actually needing and using them.
This is exactly what we are doing during this workshop.
Accessing Julia in an SSH session
Log in the training cluster
First, SSH into our training cluster UU. This takes you to the login node of the cluster.
Load necessary modules
This is done with the Lmod tool through the module command (you can find the full documentation here).
Below are some key Lmod commands:
# Get help on the module command
$ module help
# List modules that are already loaded
$ module list
# See which modules are available for Julia
$ module spider julia
# See how to load the latest version (julia/1.5.2)
$ module spider julia/1.5.2
# Load julia/1.5.2 with the required StdEnv/2020 module first
# (the order is important)
$ module load StdEnv/2020 julia/1.5.2
# You can see that we now have Julia loaded
$ module list
Install Julia packages
Install the package BenchmarkTools
- After loading the proper modules, launch Julia:
$ julia
- Within Julia, type
]
- then run:
add BenchmarkTools
Running jobs on the cluster
You should never run computing tasks on the login node. You need to submit a job to Slurm (the job scheduler used by the Compute Canada clusters) to access the computing nodes of the cluster.
Interactive jobs
To launch an interactive session on a compute node, use the salloc command.
Example:
$ salloc -t 10 -c 8 --mem=2G
This will send you on a compute node for 10 minutes. On that node, you will have access to 8 CPU and 2G of RAM. This is a place where you can launch julia
and run some computations. At the end of the 10 min, or if you cancel the job with Ctrl-D
, you will get back to the login node.
As mentioned earlier, interactive jobs have the same drawback as Jupyter. To limit your resource allocations to what you really need, you want to submit jobs to Slurm with sbatch.
Job scripts
To submit a batch job to Slurm, first, you need a Julia script to run.
Write a Julia script
Create a directory for your project in ~/
and cd
into it:
$ mkdir ~/julia_project
$ cd ~/julia_project
Write a Julia script with the text editor of your choice:
$ nano my_julia_script.jl
Write an sbatch script
Then you need to write a shell script for sbatch :
$ nano script.sh
The script may look something like this:
#!/bin/bash
#SBATCH --job-name=<name> # job name
#SBATCH --time=<time> # max walltime
#SBATCH --nodes=<N> # number of nodes
#SBATCH --cpus-per-task=<n> # number of cores on each node
#SBATCH --mem=<mem> # max memory (default unit is megabytes)
#SBATCH --output=%j.out # file name for the output
#SBATCH --error=%j.err # file name for errors
julia my_julia_script.jl
Notes:
--time
accepts these formats: "min", "min:s", "h:min:s", "d-h", "d-h:min" & "d-h:min:s"%j
gets replaced with the job number
To submit a job to the cluster:
$ cd /dir/containing/script
$ sbatch script.sh
And we can check its status with:
$ sq
PD
= pending
R
= running
CG
= completing (Slurm is doing the closing processes)
No information = your job has finished running
You can cancel it with:
$ scancel <jobid>
Once your job has finished running, you can display efficiency measures with:
$ seff <jobid>
Parallel computing
The whole point of running your Julia script on the cluster is to take advantage of its large computing power to improve the time required for it to run. There are hardware-independent techniques to optimize your Julia code. After that, the key to improve performance is code parallelization through shared memory, distributed memory, and the use of GPUs.
Shared memory (aka multi-threading)
Launching Julia on multiple threads
Starting with Julia 1.5, you can launch Julia on n
threads with:
$ julia -t n
For example, to launch Julia on 4 threads, you can run:
$ julia -t 4
For earlier versions, you need to set the JULIA_NUM_THREADS environment variable:
$ export JULIA_NUM_THREADS=n
$ julia
Or you can launch Julia with:
$ JULIA_NUM_THREADS=n julia
For example, to launch Julia on 4 threads, you can run:
$ JULIA_NUM_THREADS=4 julia
Using multiple threads
Multi-threading is supported by the Base.Threads module.
Threads.nthreads()
outputs the number of threads Julia is using and Threads.threadid()
outputs the ID of the current thread.
The Threads.@threads
macro allows to run for loops on multiple threads extremely easily.
Example:
Threads.@threads for i = 1:10
println("i = $i on thread $(Threads.threadid())")
end
The Threads.@spawn
macro allows multi-threading outside the context of loops. This feature is currently experimental and little documented, but an example is given in this blog post.
Distributed computing
Launching several Julia processes
Julia supports distributed computing thanks to the module Distributed .
There are two ways to launch several Julia processes (called "workers"):
Launch Julia on n workers
Julia can be started with the -p
flag followed by the number of workers by running:
$ julia -p n
This launches n workers, available for parallel computations, in addition to the process running the interactive prompt, so there are n + 1 Julia processes in total.
Example to start 4 worker processes:
$ julia -p 4
Launching Julia with the -p
flag automatically loads the Distributed
module.
Start workers from within a Julia session
Alternatively, workers can be started from within a Julia session. In this case, you need to load the module Distributed explicitly:
using Distributed
To launch n workers:
addprocs(n)
Example to add 4 worker processes to a running Julia session:
addprocs(4)
Managing workers
In Julia, you can see how many workers are running with:
nworkers()
The total number of processes (n + 1 ) can be returned with:
nprocs()
You can list all the worker process identifiers with:
workers()
The process running the Julia prompt has id 1 .
To kill a worker:
rmprocs(<pid>)
where <pid>
is the process identifier of the worker you want to kill (you can kill several workers by providing a list of pids).
Using workers
There are a number of convenient macros:
@everywhere
The following expression gets executed on all processes.
For instance, if your parallel code requires a module or an external package to run, you need to load that module or package with @everywhere
:
@everywhere using DataFrames
If the parallel code requires a script to run:
@everywhere include("script.jl")
If it requires a function that you are defining, you need to define it on all the workers:
@everywhere function <name>(<arguments>)
<body>
end
@spawnat
Assigns a task to a particular worker.
The first argument indicates the process id, the second argument is the expression that should be evaluated:
@spawnat <pid> <expression>
@spawnat
returns a Future
: the placeholder for a computation of unknown status and time. The function fetch
waits for a Future
to complete and returns the result of the computation.
Example:
The function myid
gives the id of the current process. As I mentioned earlier, the process running the interactive Julia prompt has the pid 1
. So myid()
normally returns 1
.
But we can "spawn" myid
on one of the worker, for instance the first worker (so pid 2
):
@spawnat 2 myid()
This returns a Future
, but if we pass it through fetch
, we get the result of myid
ran on the worker with pid 2
:
fetch(@spawnat 2 myid())
If you want tasks to be assigned to any worker automatically, you can pass the symbol :any
to @spawnat
instead of the worker id:
@spawnat :any myid()
And to get the result:
fetch(@spawnat :any myid())
If you run this multiple times, you will see that myid
is run on any of your available workers. This will however never return 1
, except when you only have one running Julia process (in that case, the process running the prompt is considered a worker).
Data too large to fit in the memory of a single node
In the case of extremely large data which cannot fit in memory on a single node, the DistributedArrays package allows to distribute large arrays across multiple nodes.
GPUs
Julia has GPU support through a number of packages. We will offer workshops and webinars on running Julia on GPUs in 2021.