The Xeon Phi coprocessor is a system on a PCIe card designed to provide high levels of floating point performance for highly parallel HPC
Its architecture, known as Many Integrated Core (MIC), features a CPU containing large numbers of simplified x64 cores with wide vector units
optimized for aggregate floating point throughput
at the expense of single-thread performance.
The MIC architecture is code-compatible, but not binary compatible, with existing code that can run on a traditional multi-core CPU. As a result, it supports many traditional HPC programming paradigms and tools such as MPI and OpenMP.
Code does not need to be specifically written for the MIC, nor altered to run on the MIC. Usually existing code can simply be re-compiled for the MIC architecture without modification and be expected to run. For code to run well on the
MIC, however, it must be highly parallel and floating point intensive.
The Stampede supercomputer is composed of over 6400 compute nodes,
and offers nearly 10 Petaflops (PF) of aggregate floating point throughput. Roughly 2 PF
are provided via traditional multi-core CPUs in the form of dual Intel Xeon E5 processors
present on each node. The remaining 8 PF, representing a large majority
of Stampede's overall floating point throughput, are provided by Xeon Phi coprocessors installed within the compute nodes.
This module describes the MIC architecture behind the Xeon Phi, its performance characteristics, how and when to run code on the coprocessors
available within Stampede in order to best take advantage of the resources available.
Cornell Center for Advanced Computing
With contributions from:
Texas Advanced Computing Center