Le Nguyen Quoc Khanh, Yapp Edward Kien Yee, Nagasundaram N, Chua Matthew Chin Heng, Yeh Hui-Yuan
Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Ave, 639818, Singapore.
Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan.
Comput Struct Biotechnol J. 2019 Oct 25;17:1245-1254. doi: 10.1016/j.csbj.2019.09.005. eCollection 2019.
Protein function prediction is one of the most well-studied topics, attracting attention from countless researchers in the field of computational biology. Implementing deep neural networks that help improve the prediction of protein function, however, is still a major challenge. In this research, we suggested a new strategy that includes gated recurrent units and position-specific scoring matrix profiles to predict vesicular transportation proteins, a biological function of great importance. Although it is difficult to discover its function, our model is able to achieve accuracies of 82.3% and 85.8% in the cross-validation and independent dataset, respectively. We also solve the problem of imbalance in the dataset via tuning class weight in the deep learning model. The results generated showed sensitivity, specificity, MCC, and AUC to have values of 79.2%, 82.9%, 0.52, and 0.861, respectively. Our strategy shows superiority in results on the same dataset against all other state-of-the-art algorithms. In our suggested research, we have suggested a technique for the discovery of more proteins, particularly proteins connected with vesicular transport. In addition, our accomplishment could encourage the use of gated recurrent units architecture in protein function prediction.
蛋白质功能预测是研究最为深入的课题之一,吸引了计算生物学领域无数研究人员的关注。然而,实现有助于改进蛋白质功能预测的深度神经网络仍然是一项重大挑战。在本研究中,我们提出了一种新策略,该策略包括门控循环单元和位置特异性评分矩阵概况,用于预测囊泡运输蛋白,这是一种非常重要的生物学功能。尽管很难发现其功能,但我们的模型在交叉验证和独立数据集中分别能够达到82.3%和85.8%的准确率。我们还通过调整深度学习模型中的类别权重解决了数据集中的不平衡问题。生成的结果显示,敏感性、特异性、马修斯相关系数和曲线下面积的值分别为79.2%、82.9%、0.52和0.861。在同一数据集上,我们的策略在结果方面优于所有其他先进算法。在我们建议的研究中,我们提出了一种发现更多蛋白质的技术,特别是与囊泡运输相关的蛋白质。此外,我们的成果可能会鼓励在蛋白质功能预测中使用门控循环单元架构。