Raimondi Daniele, Simm Jaak, Arany Adam, Moreau Yves
ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium.
Bioinformatics. 2021 Aug 25;37(16):2275-2281. doi: 10.1093/bioinformatics/btab092.
Modern bioinformatics is facing increasingly complex problems to solve, and we are indeed rapidly approaching an era in which the ability to seamlessly integrate heterogeneous sources of information will be crucial for the scientific progress. Here, we present a novel non-linear data fusion framework that generalizes the conventional matrix factorization paradigm allowing inference over arbitrary entity-relation graphs, and we applied it to the prediction of protein-protein interactions (PPIs). Improving our knowledge of PPI networks at the proteome scale is indeed crucial to understand protein function, physiological and disease states and cell life in general.
We devised three data fusion-based models for the proteome-level prediction of PPIs, and we show that our method outperforms state of the art approaches on common benchmarks. Moreover, we investigate its predictions on newly published PPIs, showing that this new data has a clear shift in its underlying distributions and we thus train and test our models on this extended dataset.
Supplementary data are available at Bioinformatics online.
现代生物信息学面临着日益复杂的问题需要解决,而且我们确实正在迅速进入一个时代,在这个时代,无缝整合异构信息源的能力对于科学进步至关重要。在此,我们提出了一种新颖的非线性数据融合框架,该框架推广了传统的矩阵分解范式,允许对任意实体关系图进行推理,并将其应用于蛋白质 - 蛋白质相互作用(PPI)的预测。在蛋白质组规模上改善我们对PPI网络的认识对于理解蛋白质功能、生理和疾病状态以及一般细胞生命确实至关重要。
我们设计了三种基于数据融合的模型用于蛋白质组水平的PPI预测,并且我们表明我们的方法在常见基准测试中优于现有方法。此外,我们研究了其对新发表的PPI的预测,表明这些新数据在其基础分布上有明显变化,因此我们在这个扩展数据集上训练和测试我们的模型。
补充数据可在《生物信息学》在线获取。