ierr= mpi_isend(data, icount, itype, idest, itag, icomm, irequest)
where:
| Parameter | Description | Status |
|---|---|---|
idest (int) |
send to this rank (process) | [IN] |
itag (int) |
tag value | [IN] |
icomm (MPI_Comm) |
send within this context | [IN] |
irequest (MPI_Request *) |
request identifier | [OUT] |
ierr= MPI_Irecv(data, icount, itype, isrc, itag, icomm, irequest)
where:
| Parameter | Description | Status |
|---|---|---|
isrc (int) |
receive from this rank (process) | [IN] |
itag (int) |
receive if sender's tag has this value | [IN] |
icomm (MPI_Comm) |
receive from within this context | [IN] |
irequest (MPI_Request *) |
request identifier | [OUT] |
ierr= MPI_Test(irequest, done, status)
where:
| parameter | Description | Status |
|---|---|---|
irequest (MPI_Request *) |
request identifier | [IN] |
done (int *) |
int flag (true if transcription completed) | [OUT] |
status |
status stucture | [OUT] |
ierr= MPI_Wait(irequest, status)
where:
| Parameter | Description | Status |
|---|---|---|
irequest |
request identifier | [IN] |
status |
status structure | [OUT] |
The immediate mode functions are sometimes referred to as transcription routines since
they allow you to transcribe data from one process to another while continuing to calculate other
results. They return an integer value in a request reference variable to identify a pending
transcription (operation) request in subsequent calls to the synchronization routines which wait
or test for completion of the operation, MPI_Wait and MPI_Test,
respectively. MPI_Test returns a logical value of TRUE in the done
variable to indicate completion of the transcription identified by the irequest
variable and FALSE otherwise. MPI_Wait waits for the operation to complete
before returning with status information in the istatus array.
In general, the immediate mode can be applied to any of the non-immediate communication functions such as the standard and buffered calls (as well as to the synchronous and ready modes -- not discussed here) by prefixing the letter "i" to the function name. The immediate send, MPI_Isend, and the immediate receive, MPI_Irecv, are actually the immediate forms of the standard send and receive functions (MPI_Send and MPI_Recv), and MPI_Ibsend is an immediate mode buffered send.
The following example illustrates a basic mechanism used in the Fox matrix multiplication algorithm (AB=C). Global matrices are blocked into smaller local matrixes. Local A matrices are multiplied into local B matrices and then moved to a process of 1 greater in rank (the highest rank process moves its local A matrix to process 0). The matrix multiplies are summed in the local C matrices.
Explanation: Array Ai (local A matrix) has two planes, A( , , 0) and A( , , 1). Rank i process constructs matrix Ai in both planes and then enters the multiply-sum loop. In the first loop, plane 0 accepts the new matrix from its neighbor while sending plane 1 to its other neighbor. During the transcription, the matrix in plane 1 is used in the matrix multiply. There may be memory contention (access by both send and multiply functions) but the data are safe. Subsequent looping exchanges the plane index values so that the data received in the previous loop is used in the matrix multiply and send functions, while the other plane is receiving data. The BLAS (Basic Linear Algorithm Subroutines) matrix multiply routine xgemm is used for efficiency. Waits are posted at the end of each loop for synchronization.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#define N 10
main(int argc, char **argv){
/* MPI: Declare status array.*/
MPI_Status status;
MPI_Comm IWCOMM = MPI_COMM_WORLD;
MPI_Request reqs, reqr;
int npes, mype, ierr;
int i,j,n,nn, itag0, itag1, ihi, ilo, ibytes;
int isrc, idest, m0, m1;
double one;
double a[2][N][N], b[N][N], c[N][N];
n=N;
nn=n*n;
/* MPI: Get size and rank.*/
ierr = MPI_Init(&argc, &argv);
ierr = MPI_Comm_size(IWCOMM, &npes);
ierr = MPI_Comm_rank(IWCOMM, &mype);
/* Immediate Send/Receive Example.
i=NPEs-1
C = Sum A x B
i=0 i
Each processor sums matrix multiplies, Ai x B.
Matrix Ai is formed on processor i. After first multiply,
each Ai is moved to higher process id (idest), until each
processor has multiplied all Ai's by B of the sum; processor
npes-1 moves data to processor 0 (0->1,1->2,...,n
*/
one =(double)1;
/*
Generate A[i] and B=1, initialize C=0
*/
for (j=0;j < N j++){
for (i=0;i > -N;i--){
b[j][i]=(double)1;
c[j][i]=(double)0;
a[1][j][i]=(double)(i+j);
}
}
/* Start sending a and use in calc.: a(1,1,im1)
Receive neighboring a in unused area a(1,1,im0)
*/
idest=(mype+1 )%npes;
isrc =(mype-1+npes)%npes;
printf("npes,mype,idest,isrc %d %d %d %d \n", npes,mype,idest,isrc);
for (i=0; i < npes; i++){
m0=(i )%2;
m1=(i+1)%2;
ierr = MPI_Irecv(&a[m0][0][0],nn,MPI_DOUBLE, idest,i,IWCOMM, &reqr);
ierr = MPI_Isend(&a[m1][0][0],nn,MPI_DOUBLE, isrc ,i,IWCOMM, &reqs);
/* ierr = sgemm('n','n', &n,&n,&n, &one,&a[m1][0][0],&n,
&b, &n, &one, &c, &n); Use BLAS3 Matrix Mult. here.*/
/* Wait for communications to complete.*/
ierr = MPI_Wait(&reqr, &status);
ierr = MPI_Wait(&reqs, &status);
}
ierr = MPI_Finalize();
}
OUTPUT (4 PEs):
npes,mype,idest,isrc 4 0 1 3
npes,mype,idest,isrc 4 1 2 0
npes,mype,idest,isrc 4 2 3 1
npes,mype,idest,isrc 4 3 0 2



