Vectorization is a process by which mathematical operations found in tight loops in scientific code are executed in parallel on special vector hardware found in
and coprocessors. A "vector" is a contiguous set of data, usually floating point numbers, often called a vector array. Each number in the array is called an element. Operations such as addition, multiplication and division are performed on small, fixed-sized vector arrays of numerical values. Each vector element
is processed simultaneously through parallel floating point hardware. The net effect of vectorization
is a speedup
in floating point computations
proportional to the length of the vector array.
Many compilers vectorize code automatically as part of their code optimization process. This process, however, is not perfect. Certain code constructs
can make it difficult or impossible for the compiler to properly vectorize floating-point intensive loops. Inefficient use of cache
and memory can negate
any performance increase obtained by vectorization. As vector lengths have increased in modern CPUs and the new Xeon Phi (MIC) coprocessor, there is more
performance to gain from vectorizing code, and greater penalty for failing to vectorize.
This module describes the vectorization process as it relates to computing hardware, compilers, and coding practices. Knowing where in code vectoriation ought to occur,
how vectorization will increase performance, and whether the compiler is vectorizing a piece of code as it should are critical to getting the full potential from
the CPUs and coprocessors of modern HPC
systems such as Stampede.
Cornell Center for Advanced Computing
With contributions from:
Texas Advanced Computing Center