Mei Gang, Xu Liangliang, Xu Nengxiong
School of Engineering and Technology, China University of Geosciences, Beijing, People's Republic of China.
R Soc Open Sci. 2017 Sep 20;4(9):170436. doi: 10.1098/rsos.170436. eCollection 2017 Sep.
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
本文着重介绍如何利用图形处理器(GPU)设计并实现并行自适应反距离加权(AIDW)插值算法。AIDW是标准反距离加权(IDW)的改进版本,它能够根据数据点的空间分布模式自适应地确定幂参数,从而实现比IDW更精确的预测。在本文中,我们首先展示了两种GPU加速的AIDW版本,即未利用共享内存的朴素版本和利用共享内存的平铺版本。我们还在单精度和双精度下,使用数组结构和对齐结构数组这两种数据布局实现了朴素版本和平铺版本。然后,我们在配备了GT730M、M5000和K40c这三款GPU的三台不同机器上,将并行AIDW与其对应的串行算法进行比较,以此评估并行AIDW的性能。实验结果表明:(i)采用不同数据布局时,计算效率没有显著差异;(ii)平铺版本总是比朴素版本稍快;(iii)在单精度下,加速比可达763(在GPU M5000上),而在双精度下,获得的最高加速比为197(在GPU K40c上)。为了方便学术界使用,与所提出的并行AIDW算法相关的所有源代码和测试数据均已公开。