Kesavan Suraj P, Bhatia Harsh, Bhatele Abhinav, Brink Stephanie, Pearce Olga, Gamblin Todd, Bremer Peer-Timo, Ma Kwan-Liu
IEEE Trans Vis Comput Graph. 2023 Mar;29(3):1691-1704. doi: 10.1109/TVCG.2021.3129414. Epub 2023 Jan 30.
Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, system software choices, and application parameters, and are interested in detecting and understanding bottlenecks in different executions. They often collect hierarchical performance profiles represented as call graphs, which combine performance metrics with their execution contexts. The crucial task of exploring multiple call graphs together is tedious and challenging because of the many structural differences in the execution contexts and significant variability in the collected performance metrics (e.g., execution runtime). In this paper, we present Ensemble CallFlow to support the exploration of ensembles of call graphs using new types of visualizations, analysis, graph operations, and features. We introduce ensemble-Sankey, a new visual design that combines the strengths of resource-flow (Sankey) and box-plot visualization techniques. Whereas the resource-flow visualization can easily and intuitively describe the graphical nature of the call graph, the box plots overlaid on the nodes of Sankey convey the performance variability within the ensemble. Our interactive visual interface provides linked views to help explore ensembles of call graphs, e.g., by facilitating the analysis of structural differences, and identifying similar or distinct call graphs. We demonstrate the effectiveness and usefulness of our design through case studies on large-scale parallel codes.
优化大规模并行代码的性能对于高效利用计算资源至关重要。代码开发者常常探索各种执行参数,如硬件配置、系统软件选择和应用参数,并希望检测和理解不同执行过程中的瓶颈。他们通常会收集以调用图形式表示的分层性能剖析信息,这些信息将性能指标与其执行上下文相结合。由于执行上下文存在许多结构差异,且收集到的性能指标(如执行运行时)存在显著变异性,因此一起探索多个调用图这一关键任务既繁琐又具有挑战性。在本文中,我们提出了Ensemble CallFlow,以使用新型可视化、分析、图形操作和功能来支持对调用图集合的探索。我们引入了集成桑基图(ensemble-Sankey),这是一种新的可视化设计,它结合了资源流(桑基图)和箱线图可视化技术的优势。资源流可视化可以轻松直观地描述调用图的图形性质,而叠加在桑基图节点上的箱线图则传达了集合内的性能变异性。我们的交互式可视化界面提供了链接视图,以帮助探索调用图集合,例如通过促进对结构差异的分析以及识别相似或不同的调用图。我们通过对大规模并行代码的案例研究展示了我们设计的有效性和实用性。