Message Passing Interface
The Message Passing Interface (MPI) is a library specification to support applications running on distributed memory systems.
A very short introduction to MPI
MPI is a library specification, proposed as a standard by the MPI Forum committee. MPI defines how processes running on different cores, CPUs or compute nodes can communicate and synchronise. The first version of the standard was published in 1994 and consisted of core MPI functionality related to message passing and synchronisation. The second version of MPI standard, published in 1998, introduced new features and extensions for parallel I/O and one-sided communication. The third version of the standard, published in 2012, provided further extensions to one-sided communication and introduced non-blocking collectives.
All MPI libraries available on Pawsey systems support the MPI-3.1 standard.
MPI was developed to support applications on distributed memory systems. The main idea is that multiple processes are spawned on available resources and each process:
- Has its own memory not directly accessible to other processes
- Executes the same program, but independent of other processes
- Works on a unique portion of the whole program (such as a subset of data to be processed)
- Has the ability to send and receive messages to and from other processes
- Has the ability to synchronise its work with other processes in the group
The lifetime of a parallel MPI code is schematically depicted in figure 1.
Figure 1. Lifetime of a parallel MPI code
This short tutorial will start by introducing six primary MPI routines:
MPI_Init
- Initialises the MPI environment
- Is the first MPI call in the code
- Must be called only once by each process
- Should be placed in the body of code after variable declarations and before any other MPI call
MPI_Finalize
- Terminates the MPI environment
- Is the last MPI command in the code
MPI_Comm_size
- Used to find out the number of MPI processes (ranks) in the communicator
MPI_Comm_rank
- Used to find out the rank (or ID) of the current process in the communicator
MPI_Send
- Used to send a message to another process in the communicator
MPI_Recv
- Used to receive a message from another process in the communicator
Here, the MPI communicator is a context within which the communication takes place; it can be thought of as a group of processes. MPI_COMM_WORLD is the default communicator, consisting of all executed processes. This is the detailed list of the six primary routines that we have just introduced, together with arguments and their types:
int MPI_Init(int * argc, char *** argv); int MPI_Finalize(void); int MPI_Comm_size(MPI_Comm comm, int * size); int MPI_Comm_rank(MPI_Comm comm, int * rank); int MPI_Send(void * buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm comm); int MPI_Recv(void * buf, int count, MPI_Datatype type, int source, int tag, MPI_Comm comm, MPI_Status * status);
MPI_INIT(IERROR) INTEGER IERROR MPI_FINALIZE(IERROR) INTEGER IERROR MPI_COMM_SIZE(COMM, SIZE, IERROR) INTEGER COMM, SIZE, IERROR MPI_COMM_RANK(COMM, RANK, IERROR) INTEGER COMM, RANK, IERROR MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERROR) <type> BUF(*) INTEGER COUNT, DATATYPE, DEST, TAG, COMM, IERROR MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERROR) <type> BUF(*) INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM INTEGER STATUS(MPI_STATUS_SIZE), IERROR
"Hello world" program
Listing 3 shows a simple "hello world" program written in C that uses four of the six primary MPI routines. Listing 4 shows the same program written in Fortran. Each process will execute the same code; note how the value of the rank variable (process ID in the communicator) is different for each process.
#include <mpi.h> #include <stdio.h> int main (int argc, char* argv[]){ int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); printf("Hello from rank %2d of %d\n",rank, size); MPI_Finalize(); return 0; }
program hello use mpi implicit none integer :: ifail integer :: rank, size call mpi_init(ifail) call mpi_comm_rank(MPI_COMM_WORLD, rank, ifail) call mpi_comm_size(MPI_COMM_WORLD, size, ifail) write (unit = *, fmt = *) "Hello from rank ", rank, " of ", size call mpi_finalize(ifail) end program hello
These codes can be compiled on a Cray supercomputer in the following way:
$ cc hello.c -o helloC # C $ ftn hello.f90 -o helloF # Fortran
The codes can be executed in an interactive SLURM session or within a batch job. The srun
launcher is required to spawn the multiple processes. For example, for an interactive session:
$ salloc -N 1 -n 4 -p debug -t 0:01:00 $ srun -n 4 ./helloF # Fortran Hello from rank 0 of 4 Hello from rank 1 of 4 Hello from rank 2 of 4 Hello from rank 3 of 4
Implementation of the toy problem
The parallel MPI version of the toy computational problem can be implemented with the use of the six library MPI calls mentioned above. This is illustrated in listing 5 for C and in listing 6 for Fortran.
Each process will execute the same code and randomly generate its own subset of points in the square. This is implemented in the first loop of the code (lines 41-46 and 50-56 in the C and Fortran code respectively). Each process will also count how many points fit in the circle. This is implemented in the same loop. The final local result will be then sent to the chosen process (manager). Manager computes and prints the final result based on partial results received from other processes. This is implemented between lines 48-59 and 58-72 for the C and Fortran code respectively. Based on the value of rank variable processes will take different actions (send or receive).
/* Compute pi using the six basic MPI functions */ #include <mpi.h> #include <stdio.h> // Random number generator -- and not a very good one, either! static long MULTIPLIER = 1366; static long ADDEND = 150889; static long PMOD = 714025; long random_last = 0; // This is not a thread-safe random number generator double lcgrandom() { long random_next; random_next = (MULTIPLIER * random_last + ADDEND)%PMOD; random_last = random_next; return ((double)random_next/(double)PMOD); } static long num_trials = 1000000; int main(int argc, char **argv) { long i; long Ncirc = 0; double pi, x, y; double r = 1.0; // radius of circle double r2 = r*r; int rank, size, manager = 0; MPI_Status status; long my_trials, temp; int j; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); my_trials = num_trials/size; if (num_trials%(long)size > (long)rank) my_trials++; random_last = rank; for (i = 0; i < my_trials; i++) { x = lcgrandom(); y = lcgrandom(); if ((x*x + y*y) <= r2) Ncirc++; } if (rank == manager) { for (j = 1; j < size; j++) { MPI_Recv(&temp, 1, MPI_LONG, j, j, MPI_COMM_WORLD, &status); Ncirc += temp; } pi = 4.0 * ((double)Ncirc)/((double)num_trials); printf("\n \t Computing pi using six basic MPI functions: \n"); printf("\t For %ld trials, pi = %f\n", num_trials, pi); printf("\n"); } else { MPI_Send(&Ncirc, 1, MPI_LONG, manager, rank, MPI_COMM_WORLD); } MPI_Finalize(); return 0; }
! Compute pi using six basic MPI subroutines ! Pseudorandom number generator ! (and a bad one at that) module lcgenerator integer*8, save :: random_last = 0 contains subroutine seed(s) integer :: s random_last = s end subroutine real function lcgrandom() integer*8, parameter :: MULTIPLIER = 1366 integer*8, parameter :: ADDEND = 150889 integer*8, parameter :: PMOD = 714025 integer*8 :: random_next = 0 random_next = mod((MULTIPLIER * random_last + ADDEND), PMOD) random_last = random_next lcgrandom = (1.0*random_next)/PMOD return end function end module lcgenerator program darts use lcgenerator implicit none include 'mpif.h' integer :: num_trials = 1000000, i = 0, Ncirc = 0 real :: pi = 0.0, x = 0.0, y = 0.0, r = 1.0 real :: r2 = 0.0 integer :: rank, np, manager = 0 integer :: mpistatus, mpierr, j integer :: my_trials, temp call MPI_Init(mpierr) call MPI_Comm_size(MPI_COMM_WORLD, np, mpierr) call MPI_Comm_rank(MPI_COMM_WORLD, rank, mpierr) r2 = r*r my_trials = num_trials/np if (mod(num_trials, np) .gt. rank) then my_trials = my_trials+1 end if call seed(rank) do i = 1, my_trials x = lcgrandom() y = lcgrandom() if ((x*x + y*y) .le. r2) then Ncirc = Ncirc+1 end if end do if (rank .eq. manager) then do j = 1, np-1 call MPI_Recv(temp, 1, MPI_INTEGER, j, j, & MPI_COMM_WORLD, mpistatus, mpierr) Ncirc = Ncirc + temp end do pi = 4.0*((1.0*Ncirc)/(1.0*num_trials)) print*, ' ' print*, ' Computing pi using six basic MPI functions:' print*, ' For ', num_trials, ' trials, pi = ', pi print*, ' ' else call MPI_Send(Ncirc, 1, MPI_INTEGER, manager, rank, & MPI_COMM_WORLD, mpierr) end if call MPI_Finalize(mpierr) end
Related pages
For detailed information on how to compile MPI software on Pawsey systems, see Compiling.
External links
- MPI: The Complete Reference, by Snir, Marc, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack Dongarra. Cambridge, MA: MIT Press (1996)
- MPI tutorial by LLNL
- For MPICH documentation, see the MPICH homepage
- For Open MPI documentation, see the Open MPI homepage
- For the MPI 3.1 standard, see MPI: A Message-Passing Interface Standard Version 3.1 (PDF)