Message Passing Interface

The Message Passing Interface (MPI) is a library specification to support applications running on distributed memory systems.

A very short introduction to MPI

MPI is a library specification, proposed as a standard by the MPI Forum committee. MPI defines how processes running on different cores, CPUs or compute nodes can communicate and synchronise. The first version of the standard was published in 1994 and consisted of core MPI functionality related to message passing and synchronisation. The second version of MPI standard, published in 1998, introduced new features and extensions for parallel I/O and one-sided communication. The third version of the standard, published in 2012, provided further extensions to one-sided communication and introduced non-blocking collectives. 

All MPI libraries available on Pawsey systems support the MPI-3.1 standard.

MPI was developed to support applications on distributed memory systems. The main idea is that multiple processes are spawned on available resources and each process:

  • Has its own memory not directly accessible to other processes
  • Executes the same program, but independent of other processes
  • Works on a unique portion of the whole program (such as a subset of data to be processed)
  • Has the ability to send and receive messages to and from other processes
  • Has the ability to synchronise its work with other processes in the group

The lifetime of a parallel MPI code is schematically depicted in figure 1.


Schematic lifetime of the MPI application

Figure 1. Lifetime of a parallel MPI code

This short tutorial will start by introducing six primary MPI routines:

MPI_Init

  • Initialises the MPI environment
  • Is the first MPI call in the code
  • Must be called only once by each process
  • Should be placed in the body of code after variable declarations and before any other MPI call

MPI_Finalize

  • Terminates the MPI environment
  • Is the last MPI command in the code

MPI_Comm_size

  • Used to find out the number of MPI processes (ranks) in the communicator

MPI_Comm_rank

  • Used to find out the rank (or ID) of the current process in the communicator 

MPI_Send

  • Used to send a message to another process in the communicator

MPI_Recv

  • Used to receive a message from another process in the communicator

Here, the MPI communicator is a context within which the communication takes place; it can be thought of as a group of processes. MPI_COMM_WORLD is the default communicator, consisting of all executed processes. This is the detailed list of the six primary routines that we have just introduced, together with arguments and their types: 

Listing 1. MPI C interface
int MPI_Init(int * argc, char *** argv);

int MPI_Finalize(void);

int MPI_Comm_size(MPI_Comm comm, int * size);

int MPI_Comm_rank(MPI_Comm comm, int * rank);

int MPI_Send(void * buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm);

int MPI_Recv(void * buf, int count, MPI_Datatype type, int source, int tag, MPI_Comm comm, MPI_Status * status);

Listing 2. MPI Fortran interface
MPI_INIT(IERROR)
    INTEGER    IERROR

MPI_FINALIZE(IERROR)
    INTEGER    IERROR

MPI_COMM_SIZE(COMM, SIZE, IERROR)
    INTEGER    COMM, SIZE, IERROR

MPI_COMM_RANK(COMM, RANK, IERROR)
    INTEGER    COMM, RANK, IERROR

MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR)
    <type>    BUF(*)
    INTEGER    COUNT, DATATYPE, DEST, TAG, COMM, IERROR

MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR)
    <type>    BUF(*)
    INTEGER    COUNT, DATATYPE, SOURCE, TAG, COMM
    INTEGER    STATUS(MPI_STATUS_SIZE), IERROR

"Hello world" program

Listing 3 shows a simple "hello world" program written in C that uses four of the six primary MPI routines. Listing 4 shows the same program written in Fortran. Each process will execute the same code; note how the value of the rank variable (process ID in the communicator) is different for each process.

Listing 3. MPI "hello world" program in C
#include <mpi.h>
#include <stdio.h>

int main (int argc, char* argv[]){

  int rank, size;

  MPI_Init(&argc,&argv);

  MPI_Comm_rank(MPI_COMM_WORLD,&rank);
  MPI_Comm_size(MPI_COMM_WORLD,&size);

  printf("Hello from rank %2d of %d\n",rank, size);

  MPI_Finalize();
  
  return 0;

} 
Listing 4. MPI "hello world" program in Fortran
program hello

  use mpi
  implicit none

  integer :: ifail
  integer :: rank, size

  call mpi_init(ifail)

  call mpi_comm_rank(MPI_COMM_WORLD, rank, ifail)
  call mpi_comm_size(MPI_COMM_WORLD, size, ifail)

  write (unit = *, fmt = *) "Hello from rank ", rank, " of ", size

  call mpi_finalize(ifail)

end program hello

These codes can be compiled on a Cray supercomputer in the following way:

Terminal 1. Compile MPI code
$ cc hello.c -o helloC     # C

$ ftn hello.f90 -o helloF  # Fortran

The codes can be executed in an interactive SLURM session or within a batch job. The srun launcher is required to spawn the multiple processes. For example, for an interactive session:

Terminal 2. Run MPI code
$ salloc -N 1 -n 4 -p debug -t 0:01:00

$ srun -n 4 ./helloF  # Fortran
 Hello from rank  0  of  4
 Hello from rank  1  of  4
 Hello from rank  2  of  4
 Hello from rank  3  of  4

Implementation of the toy problem

The parallel MPI version of the toy computational problem can be implemented with the use of the six library MPI calls mentioned above. This is illustrated in listing 5 for C and in listing 6 for Fortran.

Each process will execute the same code and randomly generate its own subset of points in the square. This is implemented in the first loop of the code (lines 41-46 and 50-56 in the C and Fortran code respectively). Each process will also count how many points fit in the circle. This is implemented in the same loop. The final local result will be then sent to the chosen process (manager). Manager computes and prints the final result based on partial results received from other processes. This is implemented between lines 48-59 and 58-72 for the C and Fortran code respectively. Based on the value of rank variable processes will take different actions (send or receive).

Listing 5. MPI toy in C
/* Compute pi using the six basic MPI functions */
#include <mpi.h>
#include <stdio.h>
 
// Random number generator -- and not a very good one, either!
static long MULTIPLIER = 1366;
static long ADDEND = 150889;
static long PMOD = 714025;
long random_last = 0;
 
// This is not a thread-safe random number generator
double lcgrandom() {
  long random_next;
  random_next = (MULTIPLIER * random_last + ADDEND)%PMOD;
  random_last = random_next;
 
  return ((double)random_next/(double)PMOD);
}

static long num_trials = 1000000;

int main(int argc, char **argv) {
  long i;
  long Ncirc = 0;
  double pi, x, y;
  double r = 1.0; // radius of circle
  double r2 = r*r;

  int rank, size, manager = 0;
  MPI_Status status;
  long my_trials, temp;
  int j;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  my_trials = num_trials/size;
  if (num_trials%(long)size > (long)rank) my_trials++;
  random_last = rank;

  for (i = 0; i < my_trials; i++) {
    x = lcgrandom();
    y = lcgrandom();
    if ((x*x + y*y) <= r2)
      Ncirc++;
  }

  if (rank == manager) {
    for (j = 1; j < size; j++) {
      MPI_Recv(&temp, 1, MPI_LONG, j, j, MPI_COMM_WORLD, &status);
      Ncirc += temp;
    }
    pi = 4.0 * ((double)Ncirc)/((double)num_trials);
    printf("\n \t Computing pi using six basic MPI functions: \n");
    printf("\t For %ld trials, pi = %f\n", num_trials, pi);
    printf("\n");
  } else {
    MPI_Send(&Ncirc, 1, MPI_LONG, manager, rank, MPI_COMM_WORLD);
  }
  MPI_Finalize();
  return 0;
}
Listing 6. MPI toy in Fortran
! Compute pi using six basic MPI subroutines


! Pseudorandom number generator
! (and a bad one at that)
        module lcgenerator
          integer*8, save :: random_last = 0
          contains

          subroutine seed(s)
            integer :: s
            random_last = s
          end subroutine

          real function lcgrandom()
            integer*8, parameter :: MULTIPLIER = 1366
            integer*8, parameter :: ADDEND = 150889
            integer*8, parameter :: PMOD = 714025
  
            integer*8 :: random_next = 0
            random_next = mod((MULTIPLIER * random_last + ADDEND), PMOD)
            random_last = random_next
            lcgrandom = (1.0*random_next)/PMOD
            return
          end function
        end module lcgenerator

        program darts
          use lcgenerator
          implicit none
          include 'mpif.h'
          integer :: num_trials = 1000000, i = 0, Ncirc = 0
          real :: pi = 0.0, x = 0.0, y = 0.0, r = 1.0
          real :: r2 = 0.0

          integer :: rank, np, manager = 0
          integer :: mpistatus, mpierr, j
          integer :: my_trials, temp

          call MPI_Init(mpierr)
          call MPI_Comm_size(MPI_COMM_WORLD, np, mpierr)
          call MPI_Comm_rank(MPI_COMM_WORLD, rank, mpierr)
          r2 = r*r
          my_trials = num_trials/np
          if (mod(num_trials, np) .gt. rank) then
           my_trials = my_trials+1
          end if
          call seed(rank)

          do i = 1, my_trials
            x = lcgrandom()
            y = lcgrandom()
            if ((x*x + y*y) .le. r2) then
              Ncirc = Ncirc+1
            end if
          end do

          if (rank .eq. manager) then
            do j = 1, np-1
              call MPI_Recv(temp, 1, MPI_INTEGER, j, j, 
     &          MPI_COMM_WORLD, mpistatus, mpierr)
              Ncirc = Ncirc + temp
            end do
            pi = 4.0*((1.0*Ncirc)/(1.0*num_trials))
            print*, '     '
            print*, '     Computing pi using six basic MPI functions:'
            print*, '     For ', num_trials, ' trials, pi = ', pi
            print*, '     '
          else
            call MPI_Send(Ncirc, 1, MPI_INTEGER, manager, rank, 
     &        MPI_COMM_WORLD, mpierr) 
          end if
          call MPI_Finalize(mpierr)
        end

Related pages

  • For detailed information on how to compile MPI software on Pawsey systems, see Compiling.

External links