Compiler Support
Directory Structure
Linking with the Intel® Math Kernel Library (Intel®
MKL)
Using MKL Parallelism
Memory Management
Performance
Precision and Rounding Control
Obtaining Version Information
Intel does not support the Intel® Math Kernel Library (Intel® MKL) for use with any compilers other than those identified in the release notes. However, Intel MKL has been successfully used with other compilers.
When using the cblas interface, the header file mkl.h will simplify the program development since it specifies enumerated values as well as prototypes of all the functions. The header determines if the program is being compiled with a C++ compiler, and if it is, the included file will be correct for use with C++ compilation.
Intel MKL separates IA-32 versions of the library and versions for Intel® Itanium® and Itanium® 2 processors. The IA-32 versions are located in the lib/32 directory and the Itanium and Itanium 2 processor versions are located in the lib/64 directory. Intel MKL consists of two parts: LAPACK, and processor specific kernels in mkl_ia32.a. The LAPACK library contains LAPACK routines and drivers that are optimized without regard to processor so that it can be used effectively on processors from Intel® Pentium® through the Pentium® 4 processor. Processor specific kernels contain BLAS, FFTs, DFTs, VSL, cblas, and VML routines that are optimized for each specific processor. Also, threading software is supplied as separate library, libguide.a, for static linking, and dynamic link library, libguide.so, when linking dynamically to Intel MKL.
The information below indicates the library's directory structure.
| lib/32 | Contains all libraries for 32-bit applications | |
| libmkl_ia32.a | Optimized kernels for Intel® Pentium®, Pentium® III, and Pentium® 4 processors | |
| libmkl_lapack.a | LAPACK routines and drivers | |
| libguide.a | Threading library for static linking | |
| libmkl.so | Library dispatcher for dynamic load of processor specific kernel | |
| libmkl_lapack32.so | LAPACK routines and drivers, single precision data types | |
| libmkl_lapack64.so | LAPACK routines and drivers, double precision data types | |
| libmkl_def.so | default kernel (Intel® Pentium®, Pentium® Pro, and Pentium® II processors) | |
| libmkl_p3.so | Intel® Pentium® III processor kernel | |
| libmkl_p4.so | Pentium 4 processor kernel | |
| libvml.so | Library dispatcher for dynamic load of processor specific VML kernels | |
| libmkl_vml_def.so | VML part of default kernel (Pentium, Pentium Pro, Pentium II processors) | |
| libmkl_vml_p3.so | VML part of Pentium III processor kernel | |
| libmkl_vml_p4.so | VML part of Pentium 4 processor kernel | |
| libguide.so | Threading library for dynamic linking | |
| lib/64 | Contains all libraries for Itanium®-based and Itanium® 2-based applications | |
| libmkl_ipf.a | Processor kernels for the Intel® Itanium® and Itanium® 2 processors | |
| libmkl_lapack.a | LAPACK routines and drivers | |
| libguide.a | Threading library for static linking | |
| libmkl_lapack32.so | LAPACK routines and drivers, single precision data types | |
| libmkl_lapack64.so | LAPACK routines and drivers, double precision data types | |
| libmkl_itp.so | Itanium processor kernel | |
| libmkl_vml_itp.so | VML part of Itanium processor kernel | |
| libguide.so | Threading library for dynamic linking | |
| libmkl.so | Library dispatcher for dynamic load of processor specific kernel | |
| libmkl_i2p.so | Itanium 2 processor kernel | |
| libmkl_vml_i2p.so | Itanium 2 processor VML kernel | |
| libvml.so | Library dispatcher for dynamic load of processor specific VML kernel | |
To use LAPACK and BLAS software, you must link the following libraries: LAPACK, processor optimized kernels, threading library, and system library for threading support. If you want to use FFT/DFT, you may add "-lm" in your link option. Some possible variants:
ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ia32.a -L$MKLPATH -lguide
-lpthread
ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread
ld myprog.o $MKLPATH/libmkl_lapack.a $MKLPATH/libmkl_ipf.a -L$MKLPATH -lguide -lpthread
ld myprog.o -L$MKLPATH -lmkl_lapack64 -lmkl -lguide -lpthread
$MKLPATH in these examples is the path to the Intel MKL.
Intel MKL is threaded in a number of places: LAPACK (*GETRF, *POTRF, *GBTRF, DGEQRF routines), all Level 3 BLAS, all DFTs (except 1D transformation), and all FFTs. Intel MKL uses OpenMP* threading software.
There are situations in which conflicts can exist in the execution environment that make the use of threads in Intel MKL problematic. We list them here with recommendations for dealing with these. First, a brief discussion of why the problem exists is appropriate.
If the user threads the program using OpenMP directives and uses the Intel® compilers to compile the program, Intel MKL and the user program will both use the same threading library. Intel MKL tries to determine if it is in a parallel region in the program, and if it is, it does not spread its operations over multiple threads. But Intel MKL can be aware that it is in a parallel region only if the threaded program and Intel MKL are using the same threading library. If the user program is threaded by some other means, Intel MKL may operate in multithreaded mode and the computations may be corrupted. Here are several cases and our recommendations for the user:
OMP_NUM_THREADS=1 in the environment. This is the default with Intel MKL 6.0.
OMP_NUM_THREADS in
the environment affects both the compiler's threading library and the
threading library with Intel MKL. At this time, the safe approach is to set
MKL_SERIAL=YES (or MKL_SERIAL=yes) which forces Intel MKL to serial mode regardless of OMP_NUM_THREADS value.
OMP_NUM_THREADS should be set to 1.
Setting the number of threads: The OpenMP* software responds
to the environmental variable OMP_NUM_THREADS. The
number of threads can be set in the shell the
program is running in. To change the number of threads, in the command
shell in which the program is going to run, enter:
export OMP_NUM_THREADS=<number of threads to use>.
To force Intel MKL to serial mode, environment variable MKL_SERIAL should be set to YES. It works regardless of OMP_NUM_THREADS value. MKL_SERIAL is not set by default.
If the variable OMP_NUM_THREADS is not set, Intel MKL
software will run on the number of threads equal to 1. We recommend always setting
OMP_NUM_THREADS to the number of processors you wish to use in your application.
MKL_FreeBuffers(). If another call is made to an Intel MKL function that
needs a memory buffer, then the memory manager will again allocate the
buffers and they will again remain allocated until either the end of the program or the program deallocates the memory.
This memory management software is turned on by default. To disable
it, set the environment variable MKL_DISABLE_FAST_MM to any value,
which will cause memory to be allocated and freed from call to call.
Disabling this feature will negatively impact performance of routines
such as the level 3 BLAS, especially for small problem sizes.
To obtain the best performance with Intel MKL, make sure the following conditions are fulfilled: arrays must be aligned on a 16-byte boundary, and the leading dimension values (n*element_size) of two-dimensional arrays must be divisible by 16. There are additional conditions for the FFT functions. The addresses of first elements of arrays and the leading dimension values (n*element_size) of two-dimensional arrays should be divisible by cache line size (32 byte for Pentium III processor and 64 byte for Pentium 4 processor). Furthermore, for the C-style FFTs on the Pentium 4 processor, the distance L between arrays that represent real and imaginary parts should not satisfy the following inequality:
k*2**16 <= L < k*2**16+64
These conditions are needed due to the use of Streaming SIMD Extensions (SSE).
To obtain the best performance with Intel MKL on Itanium-based applications the following conditions are desirable.
For the C-style FFT a sufficient condition is for the distance L between arrays that represent real and imaginary parts is not divisible by 64. The best case is if L=k*64 + 16.
For DGEMM it is desirable that the leading dimension values (n*element_size) of two-dimensional arrays are divisible by 16, but not divisible by 32.
For DFTs it is desirable that the leading dimension values (n*element_size) of two-dimensional arrays are not power-of-two.
On input to Intel MKL, precision is set to 80-bit for x87 instructions and rounding is set to "the nearest". On output, user's settings are restored.
Intel MKL provides a facility by which you can obtain information about
the library (e.g., the version number). Two methods are provided for
extracting this information. First, you may extract a version string
using the function MKLGetVersionString. Or, alternatively, you can use
the MKLGetVersion function to obtain an
MKLVersion structure that contains the version
information. Example programs for extracting this information are
provided in the mkl60/examples/versionquery
directory. Makefile is also provided to automatically build the
examples and output summary files containing the version information
for the current library. An example summary file can be found in the
readme.htm file in the same directory.
Celeron, Dialogic, i386, i486, iCOMP, Intel, Intel logo, Intel386,
Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Inside, Intel Inside logo, Intel NetBurst,
Intel NetStructure, Intel Xeon, Intel XScale, Itanium, MMX, MMX logo, Pentium, Pentium II Xeon,
Pentium III Xeon, and VTune are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2000-2003 Intel Corporation.