If you are new to supercomputing, this page will guide you through the basic steps and concepts needed to interact with a supercomputer. |
Most science experiments today involve executing computer programs to produce or analyse data. For instance, computational analysis is used to determine genetic changes within tumours; another example is the analysis of the large amount of observational data produced by a radio telescope to discover signals from extraterrestrial sources. Ordinary workstations or laptops are not able to satisfy the computational demand of these analyses. For this reason governments and, increasingly, private companies build large-scale computing infrastructures - supercomputers - to meet the ever-increasing compute needs of the scientific community.
A supercomputer is a very complex hardware and software infrastructure comprising thousands of compute nodes, comparable to high-end desktop computers, connected together via a high-speed network, the interconnect. Compute nodes are equipped with powerful CPUs, a large amount of memory and often with hardware accelerators such as Graphics Processing Units (GPUs) to carry on specialised operations. In addition, sophisticated storage infrastructures, that support the distributed filesystems accessible from all compute nodes, allow for reading and writing large volumes of data at a high rate. Where computing performance can be measured in floating-point operations per second (FLOPS), a supercomputer can achieve dozens or hundreds of petaFLOPS (one petaFLOPS is a quadrillion FLOPS). It does this by executing code in parallel across the many compute units—CPUs and accelerators—that are available. A computer program must be written using parallel programming frameworks, enabling computational work to be distributed across CPU cores, to exploit the computational bandwidth of a supercomputer.
Figure 1. Schematic representation of a supercomputer. |
From the user perspective, a supercomputer includes the following components:
To learn more about supercomputers, look at the recording of Introduction to Supercomputers (Using Supercomputers Part 1 and Part 2 on YouTube), or register for the next training session.
The computing resources available at Pawsey Supercomputing Research Centre are listed on the Resource Overview page.
In short, the Setonix supercomputer combines AMD CPUs and GPUs, based on the HPE Cray EX architecture. It has more than 200,000 CPU cores and 750 GPUs, interconnected using the Slingshot-10 interconnect with 200Gb/s bandwidth per connection. The AMD Infinity Fabric interconnect provides a direct channel of communication among GPUs as well as between CPUs and GPUs.
Many users adopt existing, highly optimised software packages to run their analyses on a supercomputer. Supercomputing applications are able to use hundreds or thousands of CPU cores in parallel to support large-scale experiments that are impossible to execute on a desktop computer. The most popular applications are already available on Pawsey supercomputers, installed in such a way as to make the most out of the computing infrastructure. Users must use the Modules system to explore and use them. Other users write their own software often relying on Pawsey-provided scientific libraries and parallel programming frameworks such as OpenMPI and HIP. In either case, the use of efficient programs is strongly encouraged to maximise supercomputer utilisation and ultimately scientific outcomes.
In this section, you are going to submit your first computation to Pawsey's latest supercomputer, Setonix. More precisely, the example you'll walk through is compiling and executing one of the codes used in the Introductory Supercomputing course: hello-mpi
. The program simply generates multiple processes, each printing its own unique MPI rank (a process identifier).
Log into Setonix using your Pawsey account. If you use the command line on your laptop, you execute the following: $ ssh <username>@setonix.pawsey.org.au
For more information about logging in, visit Connecting to a Supercomputer.
After logging in, you'll find yourself in your home directory on the /home
filesystem of the supercomputer. This is used mostly to store user configurations; to work on software and data, move to the /scratch
filesystem. In particular, each user has a directory located at /scratch/<projectcode>/<username>
. For convenience, the environment variable MYSCRATCH
containing that path is already defined. Here is an example of how to do it.
|
To read more about supercomputing filesystems, head to File Management.
The code can be found on Pawsey's GitHub profile. You can download files on the supercomputer using login nodes (for small files) or data mover nodes. Once the repository is cloned, change the current work directory to the hello-mpi
folder.
|
The source code, hello-mpi.c
can be compiled with the Cray C compiler, readily available in the environment through the cc
alias. It is strongly recommended to compile the code on compute nodes, so a script is written to submit a job to the scheduler such that the code is compiled and then executed. In the same folder, use an editor of your choice (for instance, nano
or vim
) to write a script that looks like the following.
|
Replace projcode
with your project code. The script will first compile the code, then it will launch it on 2 nodes, creating 3 processes on each one of them. The srun
command is used to execute parallel and distributed applications.
Use the sbatch
command to submit the script to the scheduler for execution.
|
Once submitted, you can use the squeue
command to check the status of your job.
|
In this case, the command shows that the job is running, what nodes are being used and the elapsed time.
If the job is not displayed anymore in the squeue
output list, then it has terminated its execution. The sacct
command shows the details of past jobs, hence it can be used to verify whether the job ended correctly.
|
The standard output and error streams are redirected to a file named slurm-<jobID>.out
that is created in the same directory the job was submitted from.
|
For more information about submitting jobs and interacting with the scheduler, visit Job Scheduling.
You can also download the output file using a file transfer tool like scp
. For instance, if you use Linux or Mac, you can do the following on a terminal window on your machine.
|
In this example we have used a data mover node. A data mover node is dedicated to large file transfers that would otherwise disrupt activities on login nodes.
You will have to think about how to migrate your workflows from a local machine or cluster to a supercomputing environment. In many cases, you will need to consider the following:
Our documentation covers many aspects of interacting with a supercomputer. Visit the landing page Supercomputing Documentation for an overview of the topics.
If you would like more information about running using the GPU resources, see Example Slurm Batch Scripts for Setonix on GPU Compute Nodes.
Disregard the following test: