一种基于实体关系图的数据融合新方法及其在蛋白质-蛋白质相互作用预测中的应用。

A novel method for data fusion over entity-relation graphs and its application to protein-protein interaction prediction.

作者信息

Raimondi Daniele, Simm Jaak, Arany Adam, Moreau Yves

机构信息

ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2275-2281. doi: 10.1093/bioinformatics/btab092.

DOI:10.1093/bioinformatics/btab092

PMID:33560405

Abstract

MOTIVATION

Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress. Here, we present a novel non-linear data fusion framework that generalizes the conventional matrix factorization paradigm allowing inference over arbitrary entity-relation graphs, and we applied it to the prediction of protein-protein interactions (PPIs). Improving our knowledge of PPI networks at the proteome scale is indeed crucial to understand protein function, physiological and disease states and cell life in general.

RESULTS

We devised three data fusion-based models for the proteome-level prediction of PPIs, and we show that our method outperforms state of the art approaches on common benchmarks. Moreover, we investigate its predictions on newly published PPIs, showing that this new data has a clear shift in its underlying distributions and we thus train and test our models on this extended dataset.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

现代生物信息学面临着日益复杂的问题需要解决，而且我们确实正在迅速进入一个时代，在这个时代，无缝整合异构信息源的能力对于科学进步至关重要。在此，我们提出了一种新颖的非线性数据融合框架，该框架推广了传统的矩阵分解范式，允许对任意实体关系图进行推理，并将其应用于蛋白质 - 蛋白质相互作用（PPI）的预测。在蛋白质组规模上改善我们对PPI网络的认识对于理解蛋白质功能、生理和疾病状态以及一般细胞生命确实至关重要。