Click here to go to the TACC Home Page

TACC

Immediate Send and Receive

The immediate send/receive mode functions, MPI_Isend and MPI_Irecv, and their auxiliary support functions provide a mechanism for potentially overlapping communications and calculations. The overlap actually occurs only if there exists an underlying hardware and software capability to perform the inter-processor memory transfer separately and concurrently while the processors compute. After a call to an immediate send or receive, control is passed back to the calling program before the sender's data has been copied to a send buffer or to a receiver's data area (MPI_Isend) and before a receiver's data area has completed filling (MPI_Irecv). It is your responsibility to test for the completion of a send or a receive operation before referencing the data in question, using the ancillary functions MPI_Test and MPI_Wait to test and wait, respectively. The calling syntax for immediate functions is:

   Message-Passing Interface
ierr= mpi_isend(data, icount, itype, idest, itag, icomm, irequest)

where:

Parameter Description Status
idest (int) send to this rank (process) [IN]
itag (int) tag value [IN]
icomm (MPI_Comm) send within this context [IN]
irequest (MPI_Request *) request identifier [OUT]


ierr= MPI_Irecv(data, icount, itype, isrc, itag, icomm, irequest)

where:

Parameter Description Status
isrc (int) receive from this rank (process) [IN]
itag (int) receive if sender's tag has this value [IN]
icomm (MPI_Comm) receive from within this context [IN]
irequest (MPI_Request *) request identifier [OUT]

ierr= MPI_Test(irequest, done, status)

where:

parameter Description Status
irequest (MPI_Request *) request identifier [IN]
done (int *) int flag (true if transcription completed) [OUT]
status status stucture [OUT]

ierr= MPI_Wait(irequest, status)

where:

Parameter Description Status
irequest request identifier [IN]
status status structure [OUT]

The immediate mode functions are sometimes referred to as transcription routines since they allow you to transcribe data from one process to another while continuing to calculate other results. They return an integer value in a request reference variable to identify a pending transcription (operation) request in subsequent calls to the synchronization routines which wait or test for completion of the operation, MPI_Wait and MPI_Test, respectively. MPI_Test returns a logical value of TRUE in the done variable to indicate completion of the transcription identified by the irequest variable and FALSE otherwise. MPI_Wait waits for the operation to complete before returning with status information in the istatus array.

In general, the immediate mode can be applied to any of the non-immediate communication functions such as the standard and buffered calls (as well as to the synchronous and ready modes -- not discussed here) by prefixing the letter "i" to the function name. The immediate send, MPI_Isend, and the immediate receive, MPI_Irecv, are actually the immediate forms of the standard send and receive functions (MPI_Send and MPI_Recv), and MPI_Ibsend is an immediate mode buffered send.

The following example illustrates a basic mechanism used in the Fox matrix multiplication algorithm (AB=C). Global matrices are blocked into smaller local matrixes. Local A matrices are multiplied into local B matrices and then moved to a process of 1 greater in rank (the highest rank process moves its local A matrix to process 0). The matrix multiplies are summed in the local C matrices.

Explanation: Array Ai (local A matrix) has two planes, A( , , 0) and A( , , 1). Rank i process constructs matrix Ai in both planes and then enters the multiply-sum loop. In the first loop, plane 0 accepts the new matrix from its neighbor while sending plane 1 to its other neighbor. During the transcription, the matrix in plane 1 is used in the matrix multiply. There may be memory contention (access by both send and multiply functions) but the data are safe. Subsequent looping exchanges the plane index values so that the data received in the previous loop is used in the matrix multiply and send functions, while the other plane is receiving data. The BLAS (Basic Linear Algorithm Subroutines) matrix multiply routine xgemm is used for efficiency. Waits are posted at the end of each loop for synchronization.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define N 10

main(int argc, char **argv){
          
/*        MPI: Declare status array.*/
   MPI_Status status;
   MPI_Comm IWCOMM = MPI_COMM_WORLD;
   MPI_Request reqs, reqr;
   int npes, mype, ierr;
   int i,j,n,nn, itag0, itag1, ihi, ilo, ibytes;
   int isrc, idest, m0, m1;
   double one;
   double a[2][N][N], b[N][N], c[N][N];

   n=N;
   nn=n*n;

/*        MPI: Get size and rank.*/
   ierr = MPI_Init(&argc, &argv);
   ierr = MPI_Comm_size(IWCOMM, &npes);
   ierr = MPI_Comm_rank(IWCOMM, &mype);

/*         Immediate Send/Receive Example.
   i=NPEs-1
   C = Sum       A x B
   i=0       i
           Each processor sums matrix multiplies, Ai x B.
           Matrix Ai is formed on processor i.  After first multiply,
           each Ai is moved to higher process id (idest), until each
           processor has multiplied all Ai's by B of the sum; processor
           npes-1 moves data to processor 0 (0->1,1->2,...,n
*/
   one   =(double)1;
/*
           Generate A[i] and B=1, initialize C=0
*/
      for (j=0;j < N j++){
        for (i=0;i > -N;i--){
             b[j][i]=(double)1;
             c[j][i]=(double)0;
          a[1][j][i]=(double)(i+j);
        }
      }
 
/*         Start sending a and use in calc.: a(1,1,im1)
           Receive neighboring a in unused area a(1,1,im0)
*/
      idest=(mype+1     )%npes;
      isrc =(mype-1+npes)%npes;
      printf("npes,mype,idest,isrc %d %d %d %d \n", npes,mype,idest,isrc);
      for (i=0; i < npes; i++){
        m0=(i  )%2;
        m1=(i+1)%2;
        ierr = MPI_Irecv(&a[m0][0][0],nn,MPI_DOUBLE, idest,i,IWCOMM, &reqr);
        ierr = MPI_Isend(&a[m1][0][0],nn,MPI_DOUBLE, isrc ,i,IWCOMM, &reqs);
/*        ierr = sgemm('n','n', &n,&n,&n, &one,&a[m1][0][0],&n, 
                       &b, &n, &one, &c, &n); Use BLAS3 Matrix Mult. here.*/
 
/*         Wait for communications to complete.*/
 
        ierr = MPI_Wait(&reqr, &status);
        ierr = MPI_Wait(&reqs, &status);
      }

   ierr = MPI_Finalize();
}

OUTPUT (4 PEs):
npes,mype,idest,isrc 4 0 1 3
npes,mype,idest,isrc 4 1 2 0
npes,mype,idest,isrc 4 2 3 1
npes,mype,idest,isrc 4 3 0 2