Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Gigascience. 2020 Aug 1;9(8). doi: 10.1093/gigascience/giaa081.
Identifying protein functions is important for many biological applications. Since experimental functional characterization of proteins is time-consuming and costly, accurate and efficient computational methods for predicting protein functions are in great demand for generating the testable hypotheses guiding large-scale experiments."
Here, we propose Graph2GO, a multi-modal graph-based representation learning model that can integrate heterogeneous information, including multiple types of interaction networks (sequence similarity network and protein-protein interaction network) and protein features (amino acid sequence, subcellular location, and protein domains) to predict protein functions on gene ontology. Comparing Graph2GO to BLAST, as a baseline model, and to two popular protein function prediction methods (Mashup and deepNF), we demonstrated that our model can achieve state-of-the-art performance. We show the robustness of our model by testing on multiple species. We also provide a web server supporting function query and downstream analysis on-the-fly.
Graph2GO is the first model that has utilized attributed network representation learning methods to model both interaction networks and protein features for predicting protein functions, and achieved promising performance. Our model can be easily extended to include more protein features to further improve the performance. Besides, Graph2GO is also applicable to other application scenarios involving biological networks, and the learned latent representations can be used as feature inputs for machine learning tasks in various downstream analyses.
鉴定蛋白质功能对于许多生物应用都很重要。由于蛋白质的实验功能特征分析既耗时又昂贵,因此,需要准确且高效的计算方法来预测蛋白质功能,以生成指导大规模实验的可测试假设。
在这里,我们提出了 Graph2GO,这是一种基于多模态图的表示学习模型,可整合多种异构信息,包括多种类型的交互网络(序列相似性网络和蛋白质-蛋白质相互作用网络)和蛋白质特征(氨基酸序列、亚细胞位置和蛋白质结构域),以在基因本体论上预测蛋白质功能。与 BLAST 相比,我们将 Graph2GO 作为基线模型,并与两种流行的蛋白质功能预测方法(Mashup 和 deepNF)进行了比较,证明了我们的模型可以达到最先进的性能。我们通过在多个物种上进行测试来展示模型的稳健性。我们还提供了一个支持即时功能查询和下游分析的网络服务器。
Graph2GO 是第一个利用有属性网络表示学习方法来模拟交互网络和蛋白质特征以预测蛋白质功能的模型,并取得了有希望的性能。我们的模型可以很容易地扩展到包含更多的蛋白质特征,以进一步提高性能。此外,Graph2GO 还适用于涉及生物网络的其他应用场景,并且学习到的潜在表示可以用作各种下游分析中机器学习任务的特征输入。