Ghosh Souvik, Nafi Md Muhaiminul Islam, Rahman M Saifur
Department of CSE, BUET, Dhaka 1000, Bangladesh.
Department of CSE, BRAC University, Dhaka 1212, Bangladesh.
Bioinform Adv. 2025 Aug 22;5(1):vbaf198. doi: 10.1093/bioadv/vbaf198. eCollection 2025.
Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.
We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.
The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.
赖氨酸(K)琥珀酰化是一种关键的翻译后修饰,参与细胞稳态和代谢,并且在最近的研究中已与多种疾病相关联。尽管其重要性日益凸显,但目前的计算方法在预测琥珀酰化位点方面的性能有限。
我们提出了ResLysEmbed,这是一种基于ResNet的新型架构,它将传统词嵌入与来自蛋白质语言模型的每个残基嵌入相结合,用于琥珀酰化位点预测。我们还比较了多种蛋白质语言模型,以确定最适合此任务的模型。此外,我们试验了几种深度学习架构,以找到最适合处理词嵌入特征的架构,并开发了三种混合架构:ConvLysEmbed、InceptLysEmbed和ResLysEmbed。其中,ResLysEmbed在两个独立测试集上的准确率、MCC和F1分数分别达到0.81、0.39、0.40和0.72、0.44、0.67,表现优异,优于现有方法。此外,我们应用Shapley加法解释分析来解释目标位点周围33长度窗口内每个残基对模型预测的影响。该分析有助于理解残基相对于目标位点的序列位置和结构距离如何影响它们对琥珀酰化预测的贡献。