Makrodimitris Stavros, Reinders Marcel, van Ham Roeland
Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands.
Keygene N.V., Wageningen, the Netherlands.
PLoS One. 2020 Nov 25;15(11):e0242723. doi: 10.1371/journal.pone.0242723. eCollection 2020.
Physical interaction between two proteins is strong evidence that the proteins are involved in the same biological process, making Protein-Protein Interaction (PPI) networks a valuable data resource for predicting the cellular functions of proteins. However, PPI networks are largely incomplete for non-model species. Here, we tested to what extent these incomplete networks are still useful for genome-wide function prediction. We used two network-based classifiers to predict Biological Process Gene Ontology terms from protein interaction data in four species: Saccharomyces cerevisiae, Escherichia coli, Arabidopsis thaliana and Solanum lycopersicum (tomato). The classifiers had reasonable performance in the well-studied yeast, but performed poorly in the other species. We showed that this poor performance can be considerably improved by adding edges predicted from various data sources, such as text mining, and that associations from the STRING database are more useful than interactions predicted by a neural network from sequence-based features.
两种蛋白质之间的物理相互作用有力地证明了这两种蛋白质参与了相同的生物学过程,这使得蛋白质-蛋白质相互作用(PPI)网络成为预测蛋白质细胞功能的宝贵数据资源。然而,对于非模式物种而言,PPI网络在很大程度上是不完整的。在此,我们测试了这些不完整的网络在多大程度上仍可用于全基因组功能预测。我们使用了两种基于网络的分类器,从酿酒酵母、大肠杆菌、拟南芥和番茄这四种物种的蛋白质相互作用数据中预测生物过程基因本体术语。这些分类器在研究充分的酵母中表现出合理的性能,但在其他物种中表现不佳。我们表明,通过添加从各种数据源(如文本挖掘)预测的边,可以显著改善这种不佳的性能,并且STRING数据库中的关联比基于序列特征由神经网络预测的相互作用更有用。