Stitz H, Luger S, Streit M, Gehlenborg N
Johannes Kepler University Linz, Austria.
Harvard Medical School, United States of America.
Comput Graph Forum. 2016 Jun;35(3):481-490. doi: 10.1111/cgf.12924. Epub 2016 Jul 4.
A major challenge in data-driven biomedical research lies in the collection and representation of data provenance information to ensure that findings are reproducibile. In order to communicate and reproduce multi-step analysis workflows executed on datasets that contain data for dozens or hundreds of samples, it is crucial to be able to visualize the provenance graph at different levels of aggregation. Most existing approaches are based on node-link diagrams, which do not scale to the complexity of typical data provenance graphs. In our proposed approach, we reduce the complexity of the graph using hierarchical and motif-based aggregation. Based on user action and graph attributes, a modular degree-of-interest (DoI) function is applied to expand parts of the graph that are relevant to the user. This interest-driven adaptive approach to provenance visualization allows users to review and communicate complex multi-step analyses, which can be based on hundreds of files that are processed by numerous workflows. We have integrated our approach into an analysis platform that captures extensive data provenance information, and demonstrate its effectiveness by means of a biomedical usage scenario.
数据驱动的生物医学研究中的一个主要挑战在于数据来源信息的收集和表示,以确保研究结果具有可重复性。为了交流和重现对包含数十个或数百个样本数据的数据集执行的多步骤分析工作流程,能够在不同聚合级别可视化来源图至关重要。大多数现有方法基于节点链接图,这种图无法扩展到典型数据来源图的复杂性。在我们提出的方法中,我们使用基于层次和基序的聚合来降低图的复杂性。基于用户操作和图属性,应用模块化兴趣度(DoI)函数来扩展与用户相关的图的部分。这种兴趣驱动的来源可视化自适应方法允许用户审查和交流复杂的多步骤分析,这些分析可以基于由众多工作流程处理的数百个文件。我们已将我们的方法集成到一个捕获广泛数据来源信息的分析平台中,并通过一个生物医学使用场景展示了其有效性。