Chelaru Florin, Corrada Bravo Héctor
BMC Bioinformatics. 2015;16 Suppl 11(Suppl 11):S4. doi: 10.1186/1471-2105-16-S11-S4. Epub 2015 Aug 13.
Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows.
In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps.
Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community.
传统上,基因组学的计算和视觉数据分析涉及多种工具和资源的组合,其中最常见的包括基因组浏览器(主要专注于大量大型数据集的综合可视化)和计算环境(主要专注于少量中等规模数据集的数据建模)。这些工具对于涉及整合和探索多个大小各异、公共和用户特定的异构数据源的工作流程处理得很差。在我们之前的工作中,我们引入了Epiviz,它弥合了这两种工具之间的差距,简化了这些工作流程。
在本文中,我们详细阐述了Epiviz背后的设计决策,并引入了一系列新的高级功能,这些功能进一步支持了我们所针对的交互式探索性工作流程类型。我们讨论了Epiviz在基因组数据分析领域取得进展的三种方式:1)它在不同层面将代码引入交互式可视化;2)通过整合来自版本控制提供商的用户插件以及允许在科学界共享分析状态,朝着协作数据分析迈出了第一步;3)结合了以前在基因组浏览器中从未同时具备的既定分析功能。在我们的讨论部分,我们阐述了当前设计的安全影响以及一系列局限性和未来的研究方向。
由于Epiviz的许多设计选择在基因组数据分析中是新颖的,本文既作为我们自身方法及经验教训的记录,也作为基因组学领域在同一方向未来努力的起点。