Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Excerpt

If you are new to supercomputing, this page will guide you through the basic steps and concepts needed to interact with a supercomputer.

...

A supercomputer is a very complex hardware and software infrastructure comprising thousands of compute nodes, comparable to high-end desktop computers, connected together via a high-speed network, the interconnect. Compute nodes are equipped with powerful CPUs, a large amount of memory and often with hardware accelerators such as Graphics Processing Units (GPUs) to carry on specialised operations. In addition, sophisticated storage infrastructures, that support the distributed filesystems accessible from all compute nodes, allow for reading and writing large volumes of data at a high rate. Where computing performance can be measured in floating-point operations per second (FLOPS), a supercomputer can achieve dozens or hundreds of petaFLOPS (one petaFLOPS is a quadrillion FLOPS). It does this by executing code in parallel across the many compute units—CPUs and accelerators—that are available. A computer program must be written using parallel programming frameworks, enabling computational work to be distributed across CPU cores, to exploit the computational bandwidth of a supercomputer.

...

  1. Log into Setonix using your Pawsey account. If you use the command line on your laptop, you execute the following:

        $ ssh <username>@setonix.pawsey.org.au

    For more information about logging in, visit Connecting to a Supercomputer.


  2. After logging in, you'll find yourself in your home directory on the /home filesystem of the supercomputer. This is used mostly to store user configurations; to work on software and data, move to the /scratch filesystem. In particular, each user has a directory located at /scratch/<projectcode>/<username>. For convenience, the environment variable MYSCRATCH containing that path is already defined. Here is an example of how to do it.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 1. Moving to the scratch folder
    $ cd $MYSCRATCH
    $ pwd
    /scratch/projectxy/userz



    To read more about supercomputing filesystems, head to File Management.

  3. The code can be found on Pawsey's GitHub profile. You can download files on the supercomputer using login nodes (for small files) or data mover nodes. Once the repository is cloned, change the current work directory to the hello-mpi folder.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 2. Cloning a git repository
    $ git clone https://github.com/PawseySC/Introductory-Supercomputing.git
    Cloning into 'Introductory-Supercomputing'...
    remote: Enumerating objects: 46, done.
    remote: Counting objects: 100% (11/11), done.
    remote: Compressing objects: 100% (9/9), done.
    remote: Total 46 (delta 2), reused 9 (delta 2), pack-reused 35
    Receiving objects: 100% (46/46), 206.28 KiB | 1.13 MiB/s, done.
    Resolving deltas: 100% (12/12), done.
    
    $ cd Introductory-Supercomputing/hello-mpi



  4. The source code, hello-mpi.c can be compiled with the Cray C compiler, readily available in the environment through the cc alias. It is strongly recommended to compile the code on compute nodes, so a script is written to submit a job to the scheduler such that the code is compiled and then executed. In the same folder, use an editor of your choice (for instance, nano  or vim) to write a script that looks like the following.

    Column
    width900px


    Code Block
    languagebash
    themeEmacs
    titleListing 1. An example of batch script
    #!/bin/bash
    
    #SBATCH --nodes=2
    #SBATCH --ntasks-per-node=3
    #SBATCH --partition=work
    #SBATCH --account=projectxy
    
    cc -o hello-mpi hello-mpi.c
    srun ./hello-mpi
    
    


    Replace projcode with your project code. The script will first compile the code, then it will launch it on 2 nodes, creating 3 processes on each one of them. The srun  command is used to execute parallel and distributed applications.

  5. Use the sbatch command to submit the script to the scheduler for execution.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 3. Submitting the job
    $ sbatch script.sh 
    Submitted batch job 1670
    


    Once submitted, you can use the squeue command to check the status of your job.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 4. Checking the status of your job.
    $ squeue --me
                 JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                  1670     work  script.s cdipietr  R       0:02      2 nid[001008-001009]
    
    


    In this case, the command shows that the job is running, what nodes are being used and the elapsed time.

  6. If the job is not displayed anymore in the squeue output list, then it has terminated its execution. The sacct command shows the details of past jobs, hence it can be used to verify whether the job ended correctly.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 5. Demonstrating the use of sacct
    $ sacct
           JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
    ------------ ---------- ---------- ---------- ---------- ---------- -------- 
    1670          script.sh      work  projcode            8  COMPLETED      0:0 
    1670.batch        batch            projcode            4  COMPLETED      0:0 
    1670.extern      extern            projcode            8  COMPLETED      0:0 
    1670.0        hello-mpi            projcode            8  COMPLETED      0:0 
    
    



  7. The standard output and error streams are redirected to a file named slurm-<jobID>.out that is created in the same directory the job was submitted from.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 6. The output of a slurm job is stored in a text file
    $ cat slurm-1670.out 
    0 of 6: Hello, World!
    3 of 6: Hello, World!
    1 of 6: Hello, World!
    2 of 6: Hello, World!
    4 of 6: Hello, World!
    5 of 6: Hello, World!
    
    


    For more information about submitting jobs and interacting with the scheduler, visit Job Scheduling.

  8. You can also download the output file using a file transfer tool like scp. For instance, if you use Linux or Mac, you can do the following on a terminal window on your machine.

    Column
    width900px


    Code Block
    languagebash
    themeDJango
    titleTerminal 7. Copying data from the supercomputer to a local machine
    $ scp data<username>@data-mover@setonixmover.pawsey.org.au:/scratch/projectxy/userz/Introductory-Supercomputing/hello-mpi/slurm-1670.out .
    #############################################################################
    #                                                                           #
    #  This computer system is operated by the Pawsey Supercomputing Centre     #
    #   for the use of authorised clients only.                                 #
    #                                                                           #
    #  The "Conditions of Use" for Pawsey systems and infrastructure can be     #
    #   found at this publically visible URI:                                   #
    #                                                                           #
    #  https://support.pawsey.org.au/documentation/display/US/Conditions+of+Use #
    #                                                                           #
    #  By continuing to use this system, you indicate your awareness of, and    #
    #   consent to, the terms and conditions of use defined in that document.   #
    #                                                                           #
    #  If you do not agree to the terms and conditions of use as defined in     #
    #   that document, DO NOT CONTINUE TO ACCESS THIS SYSTEM.                   #
    #                                                                           #
    #  Questions about this system, access to it, and Pawsey's "Conditions of   #
    #   Use" should be addressed, by email, to:  help@pawsey.org.au             #
    #                                                                           #
    #############################################################################
    slurm-1670.out                                100%  165    33.3KB/s   00:00    
    $


    In this example we have used a data mover node. A data mover node is dedicated to large file transfers that would otherwise disrupt activities on login nodes.

...

Disregard the following test: