Suppr超能文献

基于自然语言处理技术集成卷积神经网络(CNN)和双向长短期记忆网络(Bi-LSTM)用于蛋白质琥珀酰化位点预测

Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique.

作者信息

Tran Thi-Xuan, Khanh Le Nguyen Quoc, Nguyen Van-Nui

机构信息

Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.

In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.

出版信息

Comput Biol Med. 2025 Mar;186:109664. doi: 10.1016/j.compbiomed.2025.109664. Epub 2025 Jan 10.

Abstract

Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.

摘要

蛋白质琥珀酰化是一种翻译后修饰,其中琥珀酰基团(-CO-CH₂-CH₂-CO-)附着在赖氨酸残基上,在细胞过程中发挥关键的调节作用。琥珀酰化失调与包括肝脏、心脏、肺部和神经系统疾病在内的各种疾病的发生和发展有关。然而,通过实验方法识别琥珀酰化位点通常需要大量人力、成本高昂且技术上具有挑战性。为了解决这个问题,我们引入了一种名为CbiLSuccSite的方法,该方法将卷积神经网络(CNN)与双向长短期记忆(Bi-LSTM)网络相结合,用于准确预测蛋白质琥珀酰化位点。我们的方法采用词嵌入层对蛋白质序列进行编码,无需手动特征提取即可自动学习复杂的模式和依赖性。在10折交叉验证中,CBiLSuccSite取得了优异的预测性能,曲线下面积(AUC)为0.826,马修斯相关系数(MCC)为0.502。独立测试进一步验证了其稳健性,AUC为0.818,MCC为0.53。CNN和Bi-LSTM的整合利用了两种架构的优势,使CBiLSuccSite成为蛋白质语言处理和琥珀酰化位点预测的有效工具。我们的模型和代码可在以下网址公开获取:https://github.com/nuinvtnu/CBiLSuccSite。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验