School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
Int J Mol Sci. 2019 Jun 11;20(11):2845. doi: 10.3390/ijms20112845.
During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.
在过去的十年中,由于 PDB 数据库中的蛋白质数量逐渐增加,传统方法无法更好地理解化学反应中新发现的酶的功能。计算模型和用于预测酶功能的蛋白质特征表示变得更加重要。现有的大多数预测酶功能的方法都仅使用蛋白质几何结构或蛋白质序列。在本文中,我们从包括序列信息和结构信息在内的多方面生物信息中预测酶的功能。首先,我们通过位置评分矩阵从氨基酸序列中提取突变信息,并使用氨基酸距离和角度来表示结构信息。然后,我们使用直方图分别显示提取的序列和结构特征。同时,我们建立了一个三平行深度卷积神经网络(DCNN)的网络模型,同时学习酶的三个特征以进行功能预测,然后通过两种不同的架构融合输出。最后,我们在来自 PDB 的 43843 个酶的大型数据集上对所提出的模型进行了研究,当考虑序列信息时,实现了 92.34%的正确分类,与之前的结果相比有所提高。