Suppr超能文献

矩阵(因子分解)再思考:使用跨物种和辅助信息进行遗传交互作用推断的灵活方法。

Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information.

机构信息

Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742.

Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, MD 20742, USA.

出版信息

Bioinformatics. 2020 Dec 30;36(Suppl_2):i866-i874. doi: 10.1093/bioinformatics/btaa818.

Abstract

MOTIVATION

Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker's yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data are scarce.

RESULTS

In this article, we address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker's and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data-scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost.

AVAILABILITY

Implementations of models and experiments are available at: https://github.com/lrgr/EMF.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

绘制遗传相互作用(GIs)可以揭示细胞功能的重要见解,并具有潜在的转化应用。在开发用于测量 GIs(例如双敲除)的高通量实验系统以及定义用于推断(推断)未知相互作用的计算方法方面已经取得了很大进展。然而,现有的用于推断的计算方法在很大程度上是针对酿酒酵母开发和应用的,即使实验系统已经开始允许在其他环境中进行测量。重要的是,现有的方法在需要特定的辅助信息和计算成本方面存在许多限制。此外,很少有方法解决当数据稀缺时如何推断 GIs 的问题。

结果

在本文中,我们通过提出一种新的推断框架来解决这些限制,称为可扩展矩阵分解(EMF)。EMF 是一个可组合模型的框架,该框架灵活地利用跨物种信息,形式为跨多个物种的 GI 数据,以及任意的辅助信息,形式为核(例如来自蛋白质-蛋白质相互作用网络)。我们在来自酿酒酵母和裂殖酵母的匹配 GI 数据集上对这些模型进行了严格的一组实验。其中包括在同一项研究中首次在多个物种的基因组规模 GI 数据集上进行此类实验。我们发现,利用辅助信息和跨物种信息的 EMF 模型可以提高推断效果,尤其是在数据稀缺的情况下。此外,我们表明,即使使用严格较少的数据,EMF 也优于最先进的深度学习方法,并且计算成本要低几个数量级。

可用性

模型和实验的实现可在 https://github.com/lrgr/EMF 上获得。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验