Suppr超能文献

基于自然语言处理技术集成卷积神经网络(CNN)和双向长短期记忆网络(Bi-LSTM)用于蛋白质琥珀酰化位点预测

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

作者信息

Tran Thi-Xuan, Khanh Le Nguyen Quoc, Nguyen Van-Nui

机构信息

Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.

In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.

出版信息

Comput Biol Med. 2025 Mar;186:109664. doi: 10.1016/j.compbiomed.2025.109664. Epub 2025 Jan 10.

Abstract

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.

摘要

蛋白质琥珀酰化是一种翻译后修饰,其中琥珀酰基团(-CO-CH₂-CH₂-CO-)附着在赖氨酸残基上,在细胞过程中发挥关键的调节作用。琥珀酰化失调与包括肝脏、心脏、肺部和神经系统疾病在内的各种疾病的发生和发展有关。然而,通过实验方法识别琥珀酰化位点通常需要大量人力、成本高昂且技术上具有挑战性。为了解决这个问题,我们引入了一种名为CbiLSuccSite的方法,该方法将卷积神经网络(CNN)与双向长短期记忆(Bi-LSTM)网络相结合,用于准确预测蛋白质琥珀酰化位点。我们的方法采用词嵌入层对蛋白质序列进行编码,无需手动特征提取即可自动学习复杂的模式和依赖性。在10折交叉验证中,CBiLSuccSite取得了优异的预测性能,曲线下面积(AUC)为0.826,马修斯相关系数(MCC)为0.502。独立测试进一步验证了其稳健性,AUC为0.818,MCC为0.53。CNN和Bi-LSTM的整合利用了两种架构的优势,使CBiLSuccSite成为蛋白质语言处理和琥珀酰化位点预测的有效工具。我们的模型和代码可在以下网址公开获取:https://github.com/nuinvtnu/CBiLSuccSite。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验