Li Zutan, Fang Jingya, Wang Shining, Zhang Liangyun, Chen Yuanyuan, Pian Cong
College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China.
Department of Mathematics, College of Science, Nanjing Agricultural University, China.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac037.
Protein lysine crotonylation (Kcr) is an important type of posttranslational modification that is associated with a wide range of biological processes. The identification of Kcr sites is critical to better understanding their functional mechanisms. However, the existing experimental techniques for detecting Kcr sites are cost-ineffective, to a great need for new computational methods to address this problem. We here describe Adapt-Kcr, an advanced deep learning model that utilizes adaptive embedding and is based on a convolutional neural network together with a bidirectional long short-term memory network and attention architecture. On the independent testing set, Adapt-Kcr outperformed the current state-of-the-art Kcr prediction model, with an improvement of 3.2% in accuracy and 1.9% in the area under the receiver operating characteristic curve. Compared to other Kcr models, Adapt-Kcr additionally had a more robust ability to distinguish between crotonylation and other lysine modifications. Another model (Adapt-ST) was trained to predict phosphorylation sites in SARS-CoV-2, and outperformed the equivalent state-of-the-art phosphorylation site prediction model. These results indicate that self-adaptive embedding features perform better than handcrafted features in capturing discriminative information; when used in attention architecture, this could be an effective way of identifying protein Kcr sites. Together, our Adapt framework (including learning embedding features and attention architecture) has a strong potential for prediction of other protein posttranslational modification sites.
蛋白质赖氨酸巴豆酰化(Kcr)是一种重要的翻译后修饰类型,与广泛的生物学过程相关。Kcr位点的识别对于更好地理解其功能机制至关重要。然而,现有的检测Kcr位点的实验技术成本效益不高,因此迫切需要新的计算方法来解决这一问题。我们在此描述了Adapt-Kcr,这是一种先进的深度学习模型,它利用自适应嵌入,基于卷积神经网络以及双向长短期记忆网络和注意力架构。在独立测试集上,Adapt-Kcr的表现优于当前最先进的Kcr预测模型,准确率提高了3.2%,受试者工作特征曲线下面积提高了1.9%。与其他Kcr模型相比,Adapt-Kcr在区分巴豆酰化和其他赖氨酸修饰方面具有更强的能力。另一个模型(Adapt-ST)经过训练用于预测SARS-CoV-2中的磷酸化位点,其表现优于同等的最先进磷酸化位点预测模型。这些结果表明,自适应嵌入特征在捕获判别信息方面比手工特征表现更好;当用于注意力架构时,这可能是识别蛋白质Kcr位点的有效方法。总之,我们的Adapt框架(包括学习嵌入特征和注意力架构)在预测其他蛋白质翻译后修饰位点方面具有强大的潜力。