Parallel Programming Models

A variety of parallel programming models is available on Pawsey Centre's systems. This section contains introductory information about the most popular parallel programming techniques like MPI, OpenMP, CUDA and HIP.

This part of the documentation is organised as an introductory tutorial on using various parallel programming models. The main idea is to illustrate the basic usage of parallel programming paradigms on Pawsey systems rather than to provide a detailed description of a particular programming model. Each subsection ends with links to useful learning materials for further reading.

It starts with the description of a Toy Computational Problem and its C and Fortran implementations, which are then ported to those models. The models covered here are:  

Parallel Programming Models & APIs

There are other common parallelism models and APIs:

  • OpenCL is an open standard for cross-platform, parallel programming for heterogenous compute on diverse accelerators, from GPUs to FPGAs.
  • OpenACC is a programming standard for parallel computing that is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As with OpenMP, parallelisation is achieved through the use of the directive #pragma acc (in C/C++) or !$acc (in Fortran).

    Transition existing OpenACC applications to OpenMP and use OpenMP instead of OpenACC in new applications as there is better support for the OpenMP API.  

  • HPX is a programming model in C++ providing abstractions for parallel execution of code. 
  • Kokkos is a programming model in C++ providing abstractions for both parallel execution of code and data management. It currently can use CUDA, HIP, SYCL, HPX, OpenMP and C++ threads as backend programming models with several other backends in development.
  • Charm++ is a parallel programming framework in C++ supported by an adaptive runtime system that supports both irregular as well as regular applications. It can be used to specify task parallelism as well as data parallelism in a single application. 

Toy computational problem

Monte Carlo estimate of pi

The value of pi can be estimated by a simple Monte Carlo algorithm where random points are generated within a square and the proportion of points that lie inside an inscribed circle is counted. The probability of a point landing in the circle is proportional to the relative areas of the circle and square.



Serial implementation

The following code blocks show the serial implementations of the Monte Carlo pi estimator in the C and Fortran languages. Use these as references when reading through the parallel implementation of the same algorithm in the various subpages of this page.

Listing 1. Monte Carlo pi estimator, C implementation
/* Compute pi in serial */
#include <stdio.h>

// Random number generator -- and not a very good one, either!
static long MULTIPLIER = 1366;
static long ADDEND = 150889;
static long PMOD = 714025;
long random_last = 0;

// This is not a thread-safe random number generator
double lcgrandom() {
  long random_next;
  random_next = (MULTIPLIER * random_last + ADDEND)%PMOD;
  random_last = random_next;

  return ((double)random_next/(double)PMOD);
}



static long num_trials = 1000000;

int main(int argc, char **argv) {
  long i;
  long Ncirc = 0;
  double pi, x, y;
  double r = 1.0; // radius of circle
  double r2 = r*r;
  
  // for loop with most of the compute
  for (i = 0; i < num_trials; i++) {
    x = lcgrandom();
    y = lcgrandom();
    if ((x*x + y*y) <= r2)
      Ncirc++;
  }

  pi = 4.0 * ((double)Ncirc)/((double)num_trials);
  
  printf("\n \t Computing pi in serial: \n");
  printf("\t For %ld trials, pi = %f\n", num_trials, pi);
  printf("\n");

  return 0;
}
Listing 2. Monte Carlo pi estimator, Fortran implementation
! Compute pi in serial

! First, the pseudorandom number generator
        real function lcgrandom()
          integer*8, parameter :: MULTIPLIER = 1366
          integer*8, parameter :: ADDEND = 150889
          integer*8, parameter :: PMOD = 714025
          integer*8, save :: random_last = 0

          integer*8 :: random_next = 0
          random_next = mod((MULTIPLIER * random_last + ADDEND), PMOD)
          random_last = random_next
          lcgrandom = (1.0*random_next)/PMOD
          return
        end

! Now, we compute pi
        program darts
          implicit none
          integer*8 :: num_trials = 1000000, i = 0, Ncirc = 0
          real :: pi = 0.0, x = 0.0, y = 0.0, r = 1.0
          real :: r2 = 0.0
          real :: lcgrandom
          r2 = r*r

          do i = 1, num_trials
            x = lcgrandom()
            y = lcgrandom()
            if ((x*x + y*y) .le. r2) then
              Ncirc = Ncirc+1
            end if
          end do

          pi = 4.0*((1.0*Ncirc)/(1.0*num_trials))
          print*, '	'
          print*, '	Computing pi in serial:		'
          print*, ' 	For ', num_trials, ' trials, pi = ', pi
          print*, '	'

        end

The above codes can be compiled and executed on a Cray supercomputer in the following way (C example):

Terminal 1. Compile and run the toy code
$ cc pi.c -o pi.x

$ salloc -N 1 -n 1 -pdebugq -t 0:10:00 #interactive session on debug for 10 minutes

$ srun ./pi.x
 
      Computing pi in serial:
      For  1000000  trials, pi =  3.14140797

Related pages

For detailed information on how to compile software on Pawsey systems, see Compiling.