Skip to main content


Introduction

Vectorization is a process by which mathematical operations found in tight loops in scientific code are executed in parallel on special vector hardware found in CPUs and coprocessors. A "vector" is a contiguous set of data, usually floating point numbers, often called a vector array. Each number in the array is called an element. Operations such as addition, multiplication and division are performed on small, fixed-sized vector arrays of numerical values. Each vector element is processed simultaneously through parallel floating point hardware. The net effect of vectorization is a speedup in floating point computations proportional to the length of the vector array.

Many compilers vectorize code automatically as part of their code optimization process. This process, however, is not perfect. Certain code constructs can make it difficult or impossible for the compiler to properly vectorize floating-point intensive loops. Inefficient use of cache and memory can negate any performance increase obtained by vectorization. As vector lengths have increased in modern CPUs and the new Xeon Phi (MIC) coprocessor, there is more performance to gain from vectorizing code, and greater penalty for failing to vectorize.

This module describes the vectorization process as it relates to computing hardware, compilers, and coding practices. Knowing where in code vectoriation ought to occur, how vectorization will increase performance, and whether the compiler is vectorizing a piece of code as it should are critical to getting the full potential from the CPUs and coprocessors of modern HPC systems such as Stampede.

Aaron Birkland
Cornell Center for Advanced Computing

With contributions from:
Texas Advanced Computing Center

October 2013