Zhou Haiwei, Tan Wenxi, Shi Shaoping
Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.
School of Mathematical Sciences, Fudan University, Shanghai 200433, China.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad018.
Protein arginine methylation is an important posttranslational modification (PTM) associated with protein functional diversity and pathological conditions including cancer. Identification of methylation binding sites facilitates a better understanding of the molecular function of proteins. Recent developments in the field of deep neural networks have led to a proliferation of deep learning-based methylation identification studies because of their fast and accurate prediction. In this paper, we propose DeepGpgs, an advanced deep learning model incorporating Gaussian prior and gated attention mechanism. We introduce a residual network channel to extract the evolutionary information of proteins. Then we combine the adaptive embedding with bidirectional long short-term memory networks to form a context-shared encoder layer. A gated multi-head attention mechanism is followed to obtain the global information about the sequence. A Gaussian prior is injected into the sequence to assist in predicting PTMs. We also propose a weighted joint loss function to alleviate the false negative problem. We empirically show that DeepGpgs improves Matthews correlation coefficient by 6.3% on the arginine methylation independent test set compared with the existing state-of-the-art methylation site prediction methods. Furthermore, DeepGpgs has good robustness in phosphorylation site prediction of SARS-CoV-2, which indicates that DeepGpgs has good transferability and the potential to be extended to other modification sites prediction. The open-source code and data of the DeepGpgs can be obtained from https://github.com/saizhou1/DeepGpgs.
蛋白质精氨酸甲基化是一种重要的翻译后修饰(PTM),与蛋白质功能多样性及包括癌症在内的病理状况相关。甲基化结合位点的鉴定有助于更好地理解蛋白质的分子功能。由于基于深度学习的甲基化鉴定研究具有快速且准确的预测能力,深度学习领域的最新进展导致此类研究大量涌现。在本文中,我们提出了DeepGpgs,这是一种结合高斯先验和门控注意力机制的先进深度学习模型。我们引入了一个残差网络通道来提取蛋白质的进化信息。然后将自适应嵌入与双向长短期记忆网络相结合,形成一个上下文共享编码器层。随后采用门控多头注意力机制来获取序列的全局信息。将高斯先验注入序列以辅助预测PTM。我们还提出了一种加权联合损失函数来缓解假阴性问题。我们通过实验表明,与现有的最先进甲基化位点预测方法相比,DeepGpgs在精氨酸甲基化独立测试集上的马修斯相关系数提高了6.3%。此外,DeepGpgs在严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的磷酸化位点预测中具有良好的稳健性,这表明DeepGpgs具有良好的可转移性以及扩展到其他修饰位点预测的潜力。DeepGpgs的开源代码和数据可从https://github.com/saizhou1/DeepGpgs获取。