揭示交互式基因组数据分析的有效解释。

Uncovering Effective Explanations for Interactive Genomic Data Analysis.

作者信息

Huang Silu, Blatti Charles, Sinha Saurabh, Parameswaran Aditya

机构信息

Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

Institute of Genomic Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

Patterns (N Y). 2020 Sep 11;1(6):100093. doi: 10.1016/j.patter.2020.100093.

DOI:10.1016/j.patter.2020.100093

PMID:33205133

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7660438/

Abstract

Better tools are needed to enable researchers to quickly identify and explore effective and interpretable feature-based explanations for discriminating multi-class genomic datasets, e.g., healthy versus diseased samples. We develop an interactive exploration tool, GENVISAGE, which rapidly discovers the most discriminative feature pairs that separate two classes of genomic objects and then displays the corresponding visualizations. Since quickly finding top feature pairs is computationally challenging, especially for large numbers of objects and features, we propose a suite of optimizations to make GENVISAGE responsive at scale and demonstrate that our optimizations lead to a 400× speedup over competitive baselines for multiple biological datasets. We apply our rapid and interpretable tool to identify literature-supported pairs of genes whose transcriptomic responses significantly discriminate several chemotherapy drug treatments. With its generalizable optimizations and framework, GENVISAGE opens up real-time feature-based explanation generation to data from massive sequencing efforts, as well as many other scientific domains.

摘要

需要更好的工具，以使研究人员能够快速识别和探索用于区分多类基因组数据集（例如健康样本与患病样本）的有效且可解释的基于特征的解释。我们开发了一种交互式探索工具GENVISAGE，它能快速发现区分两类基因组对象的最具判别力的特征对，然后显示相应的可视化结果。由于快速找到顶级特征对在计算上具有挑战性，尤其是对于大量的对象和特征，我们提出了一套优化方法，以使GENVISAGE在大规模情况下具有响应能力，并证明我们的优化方法比多个生物数据集的竞争基线快400倍。我们应用我们快速且可解释的工具来识别文献支持的基因对，其转录组反应能显著区分几种化疗药物治疗。凭借其可推广的优化方法和框架，GENVISAGE为来自大规模测序工作以及许多其他科学领域的数据开启了基于实时特征的解释生成。