Excerpt |
---|
If you are new to supercomputing, this page will guide you through the basic steps and concepts needed to interact with a supercomputer. |
...
A supercomputer is a very complex hardware and software infrastructure comprising thousands of compute nodes, comparable to high-end desktop computers, connected together via a high-speed network, the interconnect. Compute nodes are equipped with powerful CPUs, a large amount of memory and often with hardware accelerators such as Graphics Processing Units (GPUs) to carry on specialised operations. In addition, sophisticated storage infrastructures, that support the distributed filesystems accessible from all compute nodes, allow for reading and writing large volumes of data at a high rate. Where computing performance can be measured in floating-point operations per second (FLOPS), a supercomputer can achieve dozens or hundreds of petaFLOPS (one petaFLOPS is a quadrillion FLOPS). It does this by executing code in parallel across the many compute units—CPUs and accelerators—that are available. A computer program must be written using parallel programming frameworks, enabling computational work to be distributed across CPU cores, to exploit the computational bandwidth of a supercomputer.
...
Log into Setonix using your Pawsey account. If you use the command line on your laptop, you execute the following:
$ ssh <username>@setonix.pawsey.org.au
For more information about logging in, visit Connecting to a Supercomputer.After logging in, you'll find yourself in your home directory on the
/home
filesystem of the supercomputer. This is used mostly to store user configurations; to work on software and data, move to the/scratch
filesystem. In particular, each user has a directory located at/scratch/<projectcode>/<username>
. For convenience, the environment variableMYSCRATCH
containing that path is already defined. Here is an example of how to do it.Column width 900px Code Block language bash theme DJango title Terminal 1. Moving to the scratch folder $ cd $MYSCRATCH $ pwd /scratch/projectxy/userz
To read more about supercomputing filesystems, head to File Management.
The code can be found on Pawsey's GitHub profile. You can download files on the supercomputer using login nodes (for small files) or data mover nodes. Once the repository is cloned, change the current work directory to the
hello-mpi
folder.Column width 900px Code Block language bash theme DJango title Terminal 2. Cloning a git repository $ git clone https://github.com/PawseySC/Introductory-Supercomputing.git Cloning into 'Introductory-Supercomputing'... remote: Enumerating objects: 46, done. remote: Counting objects: 100% (11/11), done. remote: Compressing objects: 100% (9/9), done. remote: Total 46 (delta 2), reused 9 (delta 2), pack-reused 35 Receiving objects: 100% (46/46), 206.28 KiB | 1.13 MiB/s, done. Resolving deltas: 100% (12/12), done. $ cd Introductory-Supercomputing/hello-mpi
The source code,
hello-mpi.c
can be compiled with the Cray C compiler, readily available in the environment through thecc
alias. It is strongly recommended to compile the code on compute nodes, so a script is written to submit a job to the scheduler such that the code is compiled and then executed. In the same folder, use an editor of your choice (for instance,nano
orvim
) to write a script that looks like the following.Column width 900px Code Block language bash theme Emacs title Listing 1. An example of batch script #!/bin/bash #SBATCH --nodes=2 #SBATCH --ntasks-per-node=3 #SBATCH --partition=work #SBATCH --account=projectxy cc -o hello-mpi hello-mpi.c srun ./hello-mpi
Replace
projcode
with your project code. The script will first compile the code, then it will launch it on 2 nodes, creating 3 processes on each one of them. Thesrun
command is used to execute parallel and distributed applications.Use the
sbatch
command to submit the script to the scheduler for execution.Column width 900px Code Block language bash theme DJango title Terminal 3. Submitting the job $ sbatch script.sh Submitted batch job 1670
Once submitted, you can use the
squeue
command to check the status of your job.Column width 900px Code Block language bash theme DJango title Terminal 4. Checking the status of your job. $ squeue --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1670 work script.s cdipietr R 0:02 2 nid[001008-001009]
In this case, the command shows that the job is running, what nodes are being used and the elapsed time.
If the job is not displayed anymore in the
squeue
output list, then it has terminated its execution. Thesacct
command shows the details of past jobs, hence it can be used to verify whether the job ended correctly.Column width 900px Code Block language bash theme DJango title Terminal 5. Demonstrating the use of sacct $ sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 1670 script.sh work projcode 8 COMPLETED 0:0 1670.batch batch projcode 4 COMPLETED 0:0 1670.extern extern projcode 8 COMPLETED 0:0 1670.0 hello-mpi projcode 8 COMPLETED 0:0
The standard output and error streams are redirected to a file named
slurm-<jobID>.out
that is created in the same directory the job was submitted from.Column width 900px Code Block language bash theme DJango title Terminal 6. The output of a slurm job is stored in a text file $ cat slurm-1670.out 0 of 6: Hello, World! 3 of 6: Hello, World! 1 of 6: Hello, World! 2 of 6: Hello, World! 4 of 6: Hello, World! 5 of 6: Hello, World!
For more information about submitting jobs and interacting with the scheduler, visit Job Scheduling.
You can also download the output file using a file transfer tool like
scp
. For instance, if you use Linux or Mac, you can do the following on a terminal window on your machine.Column width 900px Code Block language bash theme DJango title Terminal 7. Copying data from the supercomputer to a local machine $ scp datausername@data-mover@setonixmover.pawsey.org.au:/scratch/projectxy/userz/Introductory-Supercomputing/hello-mpi/slurm-1670.out . ############################################################################# # # # This computer system is operated by the Pawsey Supercomputing Centre # # for the use of authorised clients only. # # # # The "Conditions of Use" for Pawsey systems and infrastructure can be # # found at this publically visible URI: # # # # https://support.pawsey.org.au/documentation/display/US/Conditions+of+Use # # # # By continuing to use this system, you indicate your awareness of, and # # consent to, the terms and conditions of use defined in that document. # # # # If you do not agree to the terms and conditions of use as defined in # # that document, DO NOT CONTINUE TO ACCESS THIS SYSTEM. # # # # Questions about this system, access to it, and Pawsey's "Conditions of # # Use" should be addressed, by email, to: help@pawsey.org.au # # # ############################################################################# slurm-1670.out 100% 165 33.3KB/s 00:00 $
In this example we have used a data mover node. A data mover node is dedicated to large file transfers that would otherwise disrupt activities on login nodes.
...