Click here to go to the TACC Home Page
Derived Data Types

Only the basic derived data types and some simple but quite useful applications are discussed here. Full details can be found in Chapter 3, Section 12 of "MPI: A Message-Passing Interface Standard".

Transferring an entire array of integers or reals is relatively straightforward in MPI. When multiple non-contiguous sections of an array or a mixture of integer and real variables must be passed from one process to another, it is often more efficient to pack all of the data into one array and transfer it as a single message rather than sending a message for each section. Many programs have their own packing utilities for portability and MPI provides an intrinsic set of packing and unpacking functions for compatibility, MPI_Pack and MPI_Unpack, respectively. These intrinsic routines allow you to pack an array with any type of data, send it as a byte (undifferentiated) stream, receive that byte data, and unpack it.

   Message-Passing Interface

Because MPI attempts to buffer messages, it makes sense to have MPI pack your data as it copies the data into a communications buffer or data area, thereby avoiding explicit use of packing routines and reducing memory requirements. There is a price for this convenience: you must declare the arrangement as a derived data type to MPI through a function call. This data typing is quite different from that found in some languages, such as FORTRAN-90 (F90), C, and C++, because MPI data typing describes the layout of the data in memory whereas data typing within a programming language (like F90, C, and C++) groups disparate intrinsic data elements into single object which can be referenced and manipulated with operators.

Whether the data of the derived data type is actually packed into a system buffer area or is retrieved directly from its original location and sent directly to a process is implementation dependent. For instance, the MPI standard allows the low-level routines of the implementation to bypass packing a buffer if a receive has been posted. It is within the specification of the standard that data can be sent directly from its storage locations and reconstructed on the receiving process.

As an example of an MPI derived data type, consider sending columns of an NxN matrix to another process. Instead of packing each column element into a vector and sending the vector to another process, a column data type (coltype) can be defined in an MPI function and then used as the data type in a send function argument list. The only difference in the MPI_Send call is that the count is in units of the derived data type and the data type is given by its Derived Data type variable, not as an intrinsic data type constant. For example,

ierr= MPI_Send( &a[0][i], 1, coltype, idest, itag, icomm)         

will send the column of elements a[0][i], a[1][i], ..., a[N-1][i].

Intrinsic Data Types

The predefined (intrinsic) data types of MPI are given as parameters in the mpif.h include file. Derived data types are built on these basic intrinsic types which include:

Elementary MPI Data Type Corresponding FORTRAN Data Type
MPI_CHAR char (compiler default)
MPI_SHORT short (compiler default)
MPI_INT int (compiler default)
MPI_LONG long (compiler default)
MPI_FLOAT float (compiler default)
MPI_DOUBLE double (compiler default)
MPI_LONG_DOUBLE long double (compiler default)
MPI_BYTE BYTE (eight bit octets)
MPI_PACKED undifferentiated, used to specify the count in bytes.
MPI_UNSIGNED_CHAR/_SHORT/_LONG unsigned char/short/long (compiler default)

New data types are defined by MPI_Type_arrangement functions which describe the locations of intrinsic data type elements or previously defined derived data type elements. The locations (arrangements) can be contiguous, block-replicated, or indexed. A new data type is assigned to to a MPI_Datatype variable (newtype) in the calling function's argument list (e.g., MPI_Type_structure(..., newtype,...).

Before a new data type can be used in a message-passing call, it must be committed with the MPI_Type_commit function. A reference (pointer) to the new MPI_Datatype, newtype, is passed in the committing call:

ierr= MPI_Type_commit(newtype)          

where:

Parameter Description Status
newtype (MPI_Datatype *) data type to be committed (NOTE: newtype is a pointer.) [IN]

Contiguous Data Types

The simplest data type function, MPI_Type_contiguous, defines a contiguous sequence of data as a single data type element with the following syntax:

ierr= mpi_type_contiguous(icount, oldtype, newtype)          

where:

Parameter Description Status
icount (int) number of elements of oldtype data type [IN]
oldtype (MPI_Datatype) an intrinsic or previously defined data type (NOTE: oldtype is passed by value.) [IN]
newtype (MPI_Datatype *) the new data type definition (NOTE: newtype is a pointer.) [OUT]

The following section of code defines, commits, and uses a contiguous type to send M rows of a real NxN matrix to another process:

	MPI_Datatype * contigtype;
	ierr= MPI_Type_contiguous(N, MPI_DOUBLE,  contigtype);
	ierr= MPI_Type_commit(contigtype);
	ierr= MPI_Send(&a[irow,1], M, *contigtype,idest,itag,icomm);

Vector Data Types

The vector type function, MPI_Type_vector, is very convenient and simple to use. It consists of equally spaced blocks of contiguous data. The distance from the first element of one block to the first element of the next block is the stride, measured in units of the defining data type. The syntax is:

ierr= MPI_Type_vector(iblks, iblklen, istride, oldtype, newtype)     

where:

Parameter Description Status
iblks (int) number of blocks [IN]
iblklen (int) number of oldtype elements in each block [IN]
istride (int) number of elements between the start of each block [IN]
oldtype (MPI_Datatype) data type of elements in block (an intrinsic or previously defined data type). This is also the data type of strided-over elements. (NOTE: oldtype is passed by value.) [IN]
newtype (MPI_Datatype *) new data type definition (NOTE: newtype is a pointer.) [OUT]

The following diagram depicts a vector data type of three 3-element blocks (e.g., a 3x3 matrix within a 4x4 matrix).

MPI Data Types

The following code section shows how to use a vector type to send a 3x3 matrix, imbedded in a larger NxN matrix, to another process:

        MPI_Datatype vectype;
	double a[N][N];
        ...
	ierr= MPI_Type_vector(3,3,N, MPI_DOUBLE,  &vectype);
	ierr= MPI_Type_commit(&vectype);
	ierr= MPI_Send(&a[j][i], 1, vectype, idest,itag,icomm);

The MPI_Type_hvector function is identical to MPI_Type_vector except that the stride is given in bytes.

Indexed Data Types

The most general arrangement of data can be defined by using the MPI_Type_indexed function. It uses two index arrays to define the block lengths and storage locations. You can think of MPI_Type_indexed as a generalized MPI_Type_vector function that permits variable block sizes and replaces a constant stride with an array of block locations. The syntax is:

ierr MPI_Type_indexed(icount, ivblklen, ivblkloc, oldtype, newtype)

where:

Parameter Description Status
icount (int) number of blocks [IN]
iblklen (int *) length of each block (int array) [IN]
iblkloc (int *) location of each block (int array) [IN]
oldtype (MPI_Datatype) data type of elements, an intrinsic or previously defined data type [IN]
newtype (MPI_Datatype *) new data type definition [IN]