Tran Thi-Xuan, Khanh Le Nguyen Quoc, Nguyen Van-Nui
Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.
In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.
Comput Biol Med. 2025 Mar;186:109664. doi: 10.1016/j.compbiomed.2025.109664. Epub 2025 Jan 10.
Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.
蛋白质琥珀酰化是一种翻译后修饰,其中琥珀酰基团(-CO-CH₂-CH₂-CO-)附着在赖氨酸残基上,在细胞过程中发挥关键的调节作用。琥珀酰化失调与包括肝脏、心脏、肺部和神经系统疾病在内的各种疾病的发生和发展有关。然而,通过实验方法识别琥珀酰化位点通常需要大量人力、成本高昂且技术上具有挑战性。为了解决这个问题,我们引入了一种名为CbiLSuccSite的方法,该方法将卷积神经网络(CNN)与双向长短期记忆(Bi-LSTM)网络相结合,用于准确预测蛋白质琥珀酰化位点。我们的方法采用词嵌入层对蛋白质序列进行编码,无需手动特征提取即可自动学习复杂的模式和依赖性。在10折交叉验证中,CBiLSuccSite取得了优异的预测性能,曲线下面积(AUC)为0.826,马修斯相关系数(MCC)为0.502。独立测试进一步验证了其稳健性,AUC为0.818,MCC为0.53。CNN和Bi-LSTM的整合利用了两种架构的优势,使CBiLSuccSite成为蛋白质语言处理和琥珀酰化位点预测的有效工具。我们的模型和代码可在以下网址公开获取:https://github.com/nuinvtnu/CBiLSuccSite。