基于自然语言处理技术集成卷积神经网络（CNN）和双向长短期记忆网络（Bi-LSTM）用于蛋白质琥珀酰化位点预测

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

作者信息

Tran Thi-Xuan, Khanh Le Nguyen Quoc, Nguyen Van-Nui

机构信息

Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.

In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.

出版信息

Comput Biol Med. 2025 Mar;186:109664. doi: 10.1016/j.compbiomed.2025.109664. Epub 2025 Jan 10.

DOI:10.1016/j.compbiomed.2025.109664

PMID:39798505

Abstract

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.

摘要

蛋白质琥珀酰化是一种翻译后修饰，其中琥珀酰基团（-CO-CH₂-CH₂-CO-）附着在赖氨酸残基上，在细胞过程中发挥关键的调节作用。琥珀酰化失调与包括肝脏、心脏、肺部和神经系统疾病在内的各种疾病的发生和发展有关。然而，通过实验方法识别琥珀酰化位点通常需要大量人力、成本高昂且技术上具有挑战性。为了解决这个问题，我们引入了一种名为CbiLSuccSite的方法，该方法将卷积神经网络（CNN）与双向长短期记忆（Bi-LSTM）网络相结合，用于准确预测蛋白质琥珀酰化位点。我们的方法采用词嵌入层对蛋白质序列进行编码，无需手动特征提取即可自动学习复杂的模式和依赖性。在10折交叉验证中，CBiLSuccSite取得了优异的预测性能，曲线下面积（AUC）为0.826，马修斯相关系数（MCC）为0.502。独立测试进一步验证了其稳健性，AUC为0.818，MCC为0.53。CNN和Bi-LSTM的整合利用了两种架构的优势，使CBiLSuccSite成为蛋白质语言处理和琥珀酰化位点预测的有效工具。我们的模型和代码可在以下网址公开获取：https://github.com/nuinvtnu/CBiLSuccSite。

相似文献

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.基于自然语言处理技术集成卷积神经网络（CNN）和双向长短期记忆网络（Bi-LSTM）用于蛋白质琥珀酰化位点预测

Comput Biol Med. 2025 Mar;186:109664. doi: 10.1016/j.compbiomed.2025.109664. Epub 2025 Jan 10.

LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites.LSTMCNNsucc：一种基于双向 LSTM 和 CNN 的深度学习方法，用于预测赖氨酸琥珀酰化位点。

Biomed Res Int. 2021 May 28;2021:9923112. doi: 10.1155/2021/9923112. eCollection 2021.

A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN.基于 LSTM 网络和 CNN 混合架构的蛋白质琥珀酰化位点预测方法。

J Bioinform Comput Biol. 2022 Apr;20(2):2250003. doi: 10.1142/S0219720022500032. Epub 2022 Feb 21.

A systematic identification of species-specific protein succinylation sites using joint element features information.利用联合元件特征信息对物种特异性蛋白质琥珀酰化位点进行系统鉴定。

Int J Nanomedicine. 2017 Aug 28;12:6303-6315. doi: 10.2147/IJN.S140875. eCollection 2017.

Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing.通过知识蒸馏和自然语言处理增强拟南芥泛素化位点预测。

Methods. 2024 Dec;232:65-71. doi: 10.1016/j.ymeth.2024.10.006. Epub 2024 Oct 22.

Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method.基于深度学习方法的赖氨酸琥珀酰化修饰位点的鉴定与特征分析。

Sci Rep. 2019 Nov 7;9(1):16175. doi: 10.1038/s41598-019-52552-4.

MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network.MDCAN-Lys：基于多车道密集卷积注意力网络的琥珀酰化位点预测模型。

Biomolecules. 2021 Jun 11;11(6):872. doi: 10.3390/biom11060872.

Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model.基于序列信息的蛋白质琥珀酰化修饰位点预测的 IFS-LightGBM（BO）模型

Comput Math Methods Med. 2020 Nov 10;2020:8858489. doi: 10.1155/2020/8858489. eCollection 2020.

SuccinSite: a computational tool for the prediction of protein succinylation sites by exploiting the amino acid patterns and properties.琥珀酰化位点预测工具SuccinSite：利用氨基酸模式和特性预测蛋白质琥珀酰化位点的计算工具。

Mol Biosyst. 2016 Mar;12(3):786-95. doi: 10.1039/c5mb00853k. Epub 2016 Jan 7.

pSuc-EDBAM: Predicting lysine succinylation sites in proteins based on ensemble dense blocks and an attention module.pSuc-EDBAM：基于集成密集块和注意力模块预测蛋白质中的赖氨酸琥珀酰化位点。

BMC Bioinformatics. 2022 Oct 31;23(1):450. doi: 10.1186/s12859-022-05001-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于自然语言处理技术集成卷积神经网络（CNN）和双向长短期记忆网络（Bi-LSTM）用于蛋白质琥珀酰化位点预测

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

作者信息

机构信息

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献