Teodoro George, Kurc Tahsin, Andrade Guilherme, Kong Jun, Ferreira Renato, Saltz Joel
Department of Computer Science, University of Brasília, Brasília, DF, Brazil.
Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA; Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Int J High Perform Comput Appl. 2017 Jan;31(1):32-51. doi: 10.1177/1094342015594519. Epub 2015 Jul 27.
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations.
我们使用一个显微镜图像分析应用程序对多核CPU、GPU和英特尔至强融核处理器(众核-MIC)进行了对比性能研究。我们通过实验评估了计算设备在该应用程序核心操作上的性能。我们将观察到的性能与计算设备的特性、数据访问模式、计算复杂度以及操作的并行化形式相关联。结果表明,操作性能会因所使用的设备而有显著差异。对于具有规则数据访问的操作,MIC上的性能与GPU相当,有时甚至更好。由于MIC在随机数据访问方面带宽较低,因此对于不规则数据访问的操作,GPU比MIC更高效。我们提出了新的性能感知调度策略,该策略考虑了操作加速比的变化。与混合配置中的经典策略相比,我们的调度策略显著提高了应用程序性能。