Click here to go to the TACC Home Page

TACC

Pingpong Timer

Knowing the bandwidth and latency of a network that interconnects the processors of a distributed memory platform is just as important as knowing the floating point capabilities of the processors. Also, until you know the communication cost of your program, you have no idea about how the program will scale.

The purpose of this section is to provide a simple code that measures the performance of an network. The code not only measures point-to-point communication speeds for various payloads, but also shows how to use the MPI_Wtime function to determine the communication costs. (There are profiling utilities that provide communication information. Check your system's compiler documentation for profiling capabilities.)

Below is a Fortran 90 program. It runs a ping pong test several times for various payloads (array sizes). The ping pong test simply sends a message to another processor, which in turn immediately sends it back. The round trip time is measured, and the bandwidth is calculated.

The code can be compiled and run on 32- and 64-bit architectures. Note, by using the non-standard MPIW_REAL8 MPI parameter and the real*8 declarations, the code is portable. All the MPI's libraries and Fortran 90 compilers (I have) tried accept these practices (Cray, IBM, PGI compilers and MPI libs).

The code fills an array (nsize) of payload sizes to use in the measurement of the bandwidth. (Doing this outside the ping pong loop makes the code cleaner.) The sizes are determined by the formula size= Nbase*2**i, where i ranges from 0 to Npower, and Nbase is the initial value. Nbase and Npower are set in parameter statements, and the array size (nsize) and the payload arrays, A and B, are allocated at run time. The default settings use 64MBytes of memory for the A and B arrays, the send array and receive array, respectively. Each measurement is run Ntimes. The lowest, highest and average times (sec), and rates (MB/sec) are reported for the Ntimes set of measurements. You can change any of the above parameters for your own needs, without having to modify any other part of the code.

   Message-Passing Interface

The MPI_Wtime function returns the elapsed (wall-clock) time of the processor. It is a 64-bit precision floating point number (double precision on 32-bit platforms such as the IBM SP, and single precision on 64-bit machines such as Crays). The MPI_Wtick function returns the resolution of the MPI_Wtime program. It, too, returns a 64-bit precision value. Both functions do not use an argument.

     DPseconds  = MPI_Wtime()
     DPprecision= MPI_Wtick()

where:

Parameter Description Status
DPseconds Wall-clock time on processor (seconds). 64-bit precision [OUT]
DPprecision Resolution of the MPI_Wtime clock. 64-bit precision [OUT]

Pingpong code:

program pingpong

     ! Author    Kent Milfeld   TACC  8/19/2001
     ! All rights reserved, Univ. of Texas.

     !
     !  Runs ping-pong test Ntimes at each array size (bytes).
     !  Array sizes are Nbase*2**i for i = {0 ...Npower}
     !  E.G.  1024*2**0, 1024*2**1, 1024*2**2, .. 1024*2**15
     !        1024       2058       4096          32M
     !
     !  Change Parameter statement to run a different suite of tests.
     !
     !  Npower = number of tests, limited only by memory.
     !                           2 DP arrays of Nbase*2**Npower/8 required.
     !  Ntimes = number of runs/tests (max=99; need more?, change write format)
     !
     !  Nbase  = Initial (base) number of bytes.  Suggestion-- do not lower.
     !
     !  Make sure MPI_Wtime has has  microsecond resolution. (Watch the "TIME".)

   implicit none
   integer, parameter :: Ntimes=3, Npower=15, Nbase=1024
   include "mpif.h"
   integer :: istatus(MPI_STATUS_SIZE)
   integer :: ierr,npes,ipe,ICOMM

   real*8  :: sec, secmin, secmax, secavg,tar,      &
  &               ratemin,ratemax,rateavg,fmegabytes

   integer :: Nbytes,Nw
   real*8  :: tc0,tc1


   real*8, allocatable,dimension(:) :: A, B
   integer,allocatable,dimension(:) :: nsize

   integer :: n,i,k

 ! ****************************************************************
   ICOMM     = MPI_COMM_WORLD   ! Use sorter parameter-- this if Fortran.

   allocate( nsize(0:Npower) )

   do i = 0,Npower              ! Make array of word sizes for send/recv arrays.
     nsize(i) = (Nbase*2**i)/8
   end do

   N = nsize(Npower)
   if( (Nbase .ne. nsize(0)*8) .or. (N .eq. 0) ) stop
   if( (Ntimes .lt. 3) .or. (Ntimes .gt. 99)   ) stop
   allocate( A(N), B(N) )

   call MPI_Init(ierr)
   call MPI_Comm_size(ICOMM,npes,ierr)
   call MPI_Comm_rank(ICOMM,ipe, ierr)

   a=real(ipe)
   b=real(ipe)

   tc0 = MPI_Wtime()
   tc1 = MPI_Wtime()
   tar = tc1-tc0          !Used to subtract calling time of MPI_Wtime().

      do k = 0,Npower     !Loop of number of runs.

         Nw = nsize(k)
         secavg = 0.0; secmin= 1.0e32; secmax=-1.0      !Initial time info.

         do i = 1,Ntimes                                !Loop number of runs.

            tc0 = MPI_Wtime()
            if(ipe.eq.0) then
               call mpi_send(a,Nw,MPI_REAL8,1,1,ICOMM,ierr)
               call mpi_recv(b,Nw,MPI_REAL8,1,0,ICOMM,istatus,ierr)
            else
               call mpi_recv(b,Nw,MPI_REAL8,0,1,ICOMM,istatus,ierr)
               call mpi_send(a,Nw,MPI_REAL8,0,0,ICOMM,ierr)
            endif
            tc1 = MPI_Wtime()

            sec   = (tc1-tc0) - tar

            secmin=min(sec,secmin)                ! Calc. Min, Max, & Avg.
            secmax=max(sec,secmax)
            secavg=secavg + sec

         end do

         secavg=secavg/real(Ntimes)
         Nbytes =Nw*8
                                                  ! Determine rates.
         fmegabytes =real(Nbytes)/real(1024*1024)
         ratemax = real(2*fmegabytes)/secmin      ! 2 for round trip
         ratemin = real(2*fmegabytes)/secmax
         rateavg = real(2*fmegabytes)/secavg

                                                  ! Report results.
         if(ipe.eq.0) then

            if(k.eq.0) then
               write(*,'("RUNS  BYTES",14x,"RATE(MB/s)",25x,"TIME(sec)")')
               write(*,'(15x,"MIN   --    AVG   --    MAX", 7x,&
                     &       "MAX     --  AVG     --  MIN ")' )

            endif

            write(*,'(i2,1x,i8,                       &
  &                  F9.4," --", F9.4," --", F9.4, 3x,&
  &                  F9.6," --", F9.6," --", F9.6)' ) &
  &         Ntimes,Nbytes, ratemin,rateavg,ratemax, secmax,secavg,secmin
         endif
      enddo

   call MPI_Finalize(ierr)

end program

OUTPUT:
RUNS  BYTES              RATE(MB/s)                         TIME(sec)
               MIN   --    AVG   --    MAX       MAX     --  AVG     --  MIN
 3     1024   9.2565 --  17.5355 --  33.9213    0.000211 -- 0.000111 -- 0.000058
 3     2048  32.8337 --  45.8507 --  59.1480    0.000119 -- 0.000085 -- 0.000066
 3     4096  64.3772 --  84.2365 -- 102.5603    0.000121 -- 0.000093 -- 0.000076
...
 3 16777216 299.1284 -- 299.2985 -- 299.4608    0.106977 -- 0.106917 -- 0.106859
 3 33554432 298.5512 -- 298.9968 -- 299.2286    0.214369 -- 0.214049 -- 0.213883