Bajaj Chandrajit, Ihm Insung, Min Jungki, Oh Jinsang
Department of Computer Science, University of Texas at Austin, Texas, USA.
Comput Graph Forum. 2004 Dec 1;23(4):697-714. doi: 10.1111/j.1467-8659.2004.00803.x.
The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of ȳ = Ax̄ + b̄, where A is a matrix, and x̄, ȳ and b̄ are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four-wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods.
图形硬件可编程性的提高使得在商用个人电脑上能够通过高效的图形处理单元(GPU)实现广泛的通用计算。此类实现中的一个重要因素是如何充分利用现代图形处理器提供的单指令多数据(SIMD)计算能力。形如ȳ = Ax̄ + b̄的线性表达式(其中A是矩阵,x̄、ȳ和b̄是向量)构成了许多科学计算中最基本的运算之一。在本文中,我们提出了一种SIMD代码优化技术,该技术能够生成用于评估线性表达式的高效着色器代码。结果表明,通过对线性表达式中的运算进行重新排序,将算术运算有效地打包成四路SIMD指令,可以显著提高性能。我们证明了所提出的技术可有效地用于为各种数学应用编写顶点着色器和像素着色器,包括积分微分方程以及使用迭代方法求解稀疏线性方程组。