矩阵（因子分解）再思考：使用跨物种和辅助信息进行遗传交互作用推断的灵活方法。

Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742.

Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, MD 20742, USA.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i866-i874. doi: 10.1093/bioinformatics/btaa818.

MOTIVATION

Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker's yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data are scarce.

RESULTS

In this article, we address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker's and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data-scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost.

AVAILABILITY

Implementations of models and experiments are available at: https://github.com/lrgr/EMF.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

动机

绘制遗传相互作用（GIs）可以揭示细胞功能的重要见解，并具有潜在的转化应用。在开发用于测量 GIs（例如双敲除）的高通量实验系统以及定义用于推断（推断）未知相互作用的计算方法方面已经取得了很大进展。然而，现有的用于推断的计算方法在很大程度上是针对酿酒酵母开发和应用的，即使实验系统已经开始允许在其他环境中进行测量。重要的是，现有的方法在需要特定的辅助信息和计算成本方面存在许多限制。此外，很少有方法解决当数据稀缺时如何推断 GIs 的问题。

结果

在本文中，我们通过提出一种新的推断框架来解决这些限制，称为可扩展矩阵分解（EMF）。EMF 是一个可组合模型的框架，该框架灵活地利用跨物种信息，形式为跨多个物种的 GI 数据，以及任意的辅助信息，形式为核（例如来自蛋白质-蛋白质相互作用网络）。我们在来自酿酒酵母和裂殖酵母的匹配 GI 数据集上对这些模型进行了严格的一组实验。其中包括在同一项研究中首次在多个物种的基因组规模 GI 数据集上进行此类实验。我们发现，利用辅助信息和跨物种信息的 EMF 模型可以提高推断效果，尤其是在数据稀缺的情况下。此外，我们表明，即使使用严格较少的数据，EMF 也优于最先进的深度学习方法，并且计算成本要低几个数量级。

可用性

模型和实验的实现可在 https://github.com/lrgr/EMF 上获得。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i866-i874. doi: 10.1093/bioinformatics/btaa818.

MUNDO: protein function prediction embedded in a multispecies world.

Bioinform Adv. 2021 Sep 29;2(1):vbab025. doi: 10.1093/bioadv/vbab025. eCollection 2022.

Missing value imputation for epistatic MAPs.

BMC Bioinformatics. 2010 Apr 20;11:197. doi: 10.1186/1471-2105-11-197.

Predicting and explaining the impact of genetic disruptions and interactions on organismal viability.

Bioinformatics. 2022 Sep 2;38(17):4088-4099. doi: 10.1093/bioinformatics/btac519.

A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks.

Bioinformatics. 2020 Jun 1;36(11):3474-3481. doi: 10.1093/bioinformatics/btaa157.

A framework for modeling epistatic interaction.

Bioinformatics. 2021 Jul 19;37(12):1708-1716. doi: 10.1093/bioinformatics/btaa990.

MatrixEpistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment.

Bioinformatics. 2018 Jul 15;34(14):2341-2348. doi: 10.1093/bioinformatics/bty094.

A pairwise strategy for imputing predictive features when combining multiple datasets.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac839.

Large-Scale Information Retrieval and Correction of Noisy Pharmacogenomic Datasets through Residual Thresholded Deep Matrix Factorization.

bioRxiv. 2023 Dec 8:2023.12.07.570723. doi: 10.1101/2023.12.07.570723.

Missing value estimation methods for DNA methylation data.

Bioinformatics. 2019 Oct 1;35(19):3786-3793. doi: 10.1093/bioinformatics/btz134.

引用本文的文献

Knowledge graph-aided Bayesian active learning for top-K genetic interaction discovery.

Sci Rep. 2025 Aug 25;15(1):31196. doi: 10.1038/s41598-025-13972-7.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i866-i874. doi: 10.1093/bioinformatics/btaa818.

MUNDO: protein function prediction embedded in a multispecies world.

Bioinform Adv. 2021 Sep 29;2(1):vbab025. doi: 10.1093/bioadv/vbab025. eCollection 2022.

Missing value imputation for epistatic MAPs.

BMC Bioinformatics. 2010 Apr 20;11:197. doi: 10.1186/1471-2105-11-197.

Predicting and explaining the impact of genetic disruptions and interactions on organismal viability.

Bioinformatics. 2022 Sep 2;38(17):4088-4099. doi: 10.1093/bioinformatics/btac519.

A graph regularized generalized matrix factorization model for predicting links in biomedical bipartite networks.

Bioinformatics. 2020 Jun 1;36(11):3474-3481. doi: 10.1093/bioinformatics/btaa157.

A framework for modeling epistatic interaction.

Bioinformatics. 2021 Jul 19;37(12):1708-1716. doi: 10.1093/bioinformatics/btaa990.

MatrixEpistasis: ultrafast, exhaustive epistasis scan for quantitative traits with covariate adjustment.

Bioinformatics. 2018 Jul 15;34(14):2341-2348. doi: 10.1093/bioinformatics/bty094.

A pairwise strategy for imputing predictive features when combining multiple datasets.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac839.

Large-Scale Information Retrieval and Correction of Noisy Pharmacogenomic Datasets through Residual Thresholded Deep Matrix Factorization.

bioRxiv. 2023 Dec 8:2023.12.07.570723. doi: 10.1101/2023.12.07.570723.

Missing value estimation methods for DNA methylation data.

Bioinformatics. 2019 Oct 1;35(19):3786-3793. doi: 10.1093/bioinformatics/btz134.

引用本文的文献

Knowledge graph-aided Bayesian active learning for top-K genetic interaction discovery.

Sci Rep. 2025 Aug 25;15(1):31196. doi: 10.1038/s41598-025-13972-7.

Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献