Zhang Xiaoshuai, Wang Lixin, Liu Hucheng, Zhang Xiaofeng, Liu Bo, Wang Yadong, Li Junyi
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):2772-2780. doi: 10.1109/TCBB.2021.3139841. Epub 2023 Oct 9.
Protein is the main material basis of living organisms and plays crucial role in life activities. Understanding the function of protein is of great significance for new drug discovery, disease treatment and vaccine development. In recent years, with the widespread application of deep learning in bioinformatics, researchers have proposed many deep learning models to predict protein functions. However, the existing deep learning methods usually only consider protein sequences, and thus cannot effectively integrate multi-source data to annotate protein functions. In this article, we propose the Prot2GO model, which can integrate protein sequence and PPI network data to predict protein functions. We utilize an improved biased random walk algorithm to extract the features of PPI network. For sequence data, we use a convolutional neural network to obtain the local features of the sequence and a recurrent neural network to capture the long-range associations between amino acid residues in protein sequence. Moreover, Prot2GO adopts the attention mechanism to identify protein motifs and structural domains. Experiments show that Prot2GO model achieves the state-of-the-art performance on multiple metrics.
蛋白质是生物体的主要物质基础,在生命活动中起着至关重要的作用。了解蛋白质的功能对于新药研发、疾病治疗和疫苗开发具有重要意义。近年来,随着深度学习在生物信息学中的广泛应用,研究人员提出了许多深度学习模型来预测蛋白质功能。然而,现有的深度学习方法通常只考虑蛋白质序列,因此无法有效地整合多源数据来注释蛋白质功能。在本文中,我们提出了Prot2GO模型,该模型可以整合蛋白质序列和蛋白质-蛋白质相互作用(PPI)网络数据来预测蛋白质功能。我们利用一种改进的有偏随机游走算法来提取PPI网络的特征。对于序列数据,我们使用卷积神经网络来获取序列的局部特征,并使用循环神经网络来捕捉蛋白质序列中氨基酸残基之间的长程关联。此外,Prot2GO采用注意力机制来识别蛋白质基序和结构域。实验表明,Prot2GO模型在多个指标上达到了当前最优的性能。