ENSIGMAG-Antenne de Montbonnot, Montbonnot Saint Martin, France.
IEEE Trans Vis Comput Graph. 2010 Sep-Oct;16(5):815-28. doi: 10.1109/TVCG.2010.19.
One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\schmi O}(N\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
在可视化大型数据集时,一个重要的瓶颈是处理器和内存之间的数据传输。缓存感知 (CA) 和缓存不感知 (CO) 算法考虑了内存层次结构,以设计缓存高效的算法。CO 方法的优点是可以适应未知和变化的内存层次结构。最近为 3D 网格布局开发的 CA 和 CO 算法显著提高了以前方法的性能,但它们缺乏理论性能保证。我们在本文中提出了一种用于非结构化但形状良好的网格的 CO 布局的 {\schmi O}(N\log N) 算法。我们证明,在维度 d 中对 N 大小的网格进行一致遍历会导致少于 N/B+{\schmi O}(N/M;{1/d}) 的缓存缺失,其中 B 和 M 分别是块大小和缓存大小。实验表明,我们的布局计算比已知的最佳 CO 算法更快,并且内存消耗明显更低。对于经典的可视化算法访问模式,其性能与该算法相当,或者当在计算布局时生成的 BSP 树被用作根据布局调整的加速数据结构时,其性能更好。我们还表明,缓存不感知方法在最近的 GPU 架构上导致了显著的性能提升。