IEEE Trans Vis Comput Graph. 2011 Apr;17(4):466-74. doi: 10.1109/TVCG.2010.88. Epub 2010 Dec 3.
Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
最近的 GPU 算法通过使用广度优先搜索(BFS)构建顺序,为中等复杂模型实现了有希望的性能。虽然能够利用 GPU 的大规模并行性,但 BFS 顺序也消耗了过多的 GPU 内存,这对于涉及具有数百万个三角形以上的非常复杂模型的交互式应用程序来说是一个严重的问题。在本文中,我们提出使用部分广度优先搜索(PBFS)构建顺序来控制内存消耗,同时最大化性能。我们将 PBFS 顺序应用于两种层次结构构建算法。第一种算法是用于 kd-树的,它自动在并行度和中间内存使用之间进行平衡。使用 PBFS,可以有效地控制构建过程中的峰值内存消耗,而无需昂贵的 CPU-GPU 数据传输。我们还开发了内存分配策略来有效地限制内存碎片。生成的算法与 GPU 内存很好地扩展,可以在具有 1GB 内存的 GPU 上以交互速率构建具有数百万个三角形的模型的 kd-树。与现有算法相比,我们的算法在给定 GPU 内存限制下的可扩展性提高了一个数量级。第二种算法是基于 PBFS 构建顺序的非常大场景的核外包围体层次结构(BVH)构建。在每一次迭代中,所有构建的节点都转储到 CPU 内存中,GPU 内存被释放以供下一次迭代使用。通过这种方式,该算法能够构建太大而无法存储在 GPU 内存中的树。实验表明,我们的算法可以构建多达 2000 万个三角形的场景的 BVH,比以前的 GPU 算法大几倍。