• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PLM-ATG:通过将蛋白质语言模型嵌入与基于位置特异性得分矩阵的特征相结合来鉴定自噬蛋白

PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features.

作者信息

Wang Yangying, Wang Chunhua

机构信息

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.

出版信息

Molecules. 2025 Apr 10;30(8):1704. doi: 10.3390/molecules30081704.

DOI:10.3390/molecules30081704
PMID:40333592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12029579/
Abstract

Autophagy critically regulates cellular development while maintaining pathophysiological homeostasis. Since the autophagic process is tightly regulated by the coordination of autophagy-related proteins (ATGs), precise identification of these proteins is essential. Although current computational approaches have addressed experimental recognition's costly and time-consuming challenges, they still have room for improvement since handcrafted features inadequately capture the intricate patterns and relationships hidden in sequences. In this study, we propose PLM-ATG, a novel computational model that integrates support vector machines with the fusion of protein language model (PLM) embeddings and position-specific scoring matrix (PSSM)-based features for the ATG identification. First, we extracted sequence-based features and PSSM-based features as the inputs of six classifiers to establish baseline models. Among these, the combination of the SVM classifier and the AADP-PSSM feature set achieved the best prediction accuracy. Second, two popular PLM embeddings, i.e., ESM-2 and ProtT5, were fused with the AADP-PSSM features to further improve the prediction of ATGs. Third, we selected the optimal feature subset from the combination of the ESM-2 embeddings and AADP-PSSM features to train the final SVM model. The proposed PLM-ATG achieved an accuracy of 99.5% and an MCC of 0.990, which are nearly 5% and 0.1 higher than those of the state-of-the-art model EnsembleDL-ATG, respectively.

摘要

自噬在维持病理生理稳态的同时,对细胞发育起着关键的调节作用。由于自噬过程受到自噬相关蛋白(ATG)协同作用的严格调控,因此精确识别这些蛋白至关重要。尽管目前的计算方法解决了实验识别成本高和耗时的挑战,但由于手工制作的特征无法充分捕捉序列中隐藏的复杂模式和关系,它们仍有改进的空间。在本研究中,我们提出了PLM-ATG,这是一种新型计算模型,它将支持向量机与基于蛋白质语言模型(PLM)嵌入和特定位置评分矩阵(PSSM)特征的融合相结合,用于ATG识别。首先,我们提取基于序列的特征和基于PSSM的特征作为六个分类器输入,以建立基线模型。其中,支持向量机分类器和AADP-PSSM特征集的组合实现了最佳预测准确率。其次,将两种流行的PLM嵌入,即ESM-2和ProtT5,与AADP-PSSM特征融合,以进一步提高ATG的预测能力。第三,我们从ESM-2嵌入和AADP-PSSM特征的组合中选择最优特征子集,以训练最终的支持向量机模型。所提出的PLM-ATG的准确率达到了99.5%,马修斯相关系数为0.990,分别比最先进的模型EnsembleDL-ATG高出近5%和0.1。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/6cf73280bcec/molecules-30-01704-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/b23a3685fce0/molecules-30-01704-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/100cb191250e/molecules-30-01704-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/cd4d9a294226/molecules-30-01704-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/c0f7526ac068/molecules-30-01704-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/504ac8b71d3d/molecules-30-01704-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/4462c0acdcd4/molecules-30-01704-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/450368464914/molecules-30-01704-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/e9a6c3202ae2/molecules-30-01704-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/5986b06a848b/molecules-30-01704-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/199d3eeceba9/molecules-30-01704-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/85a4ba0d3e4d/molecules-30-01704-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/6cf73280bcec/molecules-30-01704-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/b23a3685fce0/molecules-30-01704-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/100cb191250e/molecules-30-01704-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/cd4d9a294226/molecules-30-01704-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/c0f7526ac068/molecules-30-01704-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/504ac8b71d3d/molecules-30-01704-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/4462c0acdcd4/molecules-30-01704-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/450368464914/molecules-30-01704-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/e9a6c3202ae2/molecules-30-01704-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/5986b06a848b/molecules-30-01704-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/199d3eeceba9/molecules-30-01704-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/85a4ba0d3e4d/molecules-30-01704-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/6cf73280bcec/molecules-30-01704-g012.jpg

相似文献

1
PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features.PLM-ATG:通过将蛋白质语言模型嵌入与基于位置特异性得分矩阵的特征相结合来鉴定自噬蛋白
Molecules. 2025 Apr 10;30(8):1704. doi: 10.3390/molecules30081704.
2
Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.可成药蛋白的综合研究:从位置特异性得分矩阵到预训练语言模型
Int J Mol Sci. 2024 Apr 19;25(8):4507. doi: 10.3390/ijms25084507.
3
EnsembleDL-ATG: Identifying autophagy proteins by integrating their sequence and evolutionary information using an ensemble deep learning framework.集成深度学习自噬蛋白识别方法(EnsembleDL-ATG):使用集成深度学习框架整合自噬蛋白的序列和进化信息来识别自噬蛋白。
Comput Struct Biotechnol J. 2023 Sep 29;21:4836-4848. doi: 10.1016/j.csbj.2023.09.036. eCollection 2023.
4
PLM-T3SE: Accurate Prediction of Type III Secretion Effectors Using Protein Language Model Embeddings.PLM-T3SE:利用蛋白质语言模型嵌入技术准确预测III型分泌效应蛋白
J Cell Biochem. 2025 Jan;126(1):e30642. doi: 10.1002/jcb.30642. Epub 2024 Aug 20.
5
Stack-VTP: prediction of vesicle transport proteins based on stacked ensemble classifier and evolutionary information.Stack-VTP:基于堆叠集成分类器和进化信息的囊泡转运蛋白预测。
BMC Bioinformatics. 2023 Apr 7;24(1):137. doi: 10.1186/s12859-023-05257-5.
6
Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting.目标-DBPPred:一种使用基于离散小波变换的压缩和轻极限梯度提升的智能 DNA 结合蛋白预测模型。
Comput Biol Med. 2022 Jun;145:105533. doi: 10.1016/j.compbiomed.2022.105533. Epub 2022 Apr 16.
7
Prediction of apoptosis protein subcellular location based on position-specific scoring matrix and isometric mapping algorithm.基于位置特异性评分矩阵和等距映射算法预测凋亡蛋白亚细胞定位。
Med Biol Eng Comput. 2019 Dec;57(12):2553-2565. doi: 10.1007/s11517-019-02045-3. Epub 2019 Oct 16.
8
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection.iT3SE-PX:使用 PSSM 特征和 XGBoost 特征选择鉴定细菌 III 型分泌效应子。
Comput Math Methods Med. 2021 Jan 6;2021:6690299. doi: 10.1155/2021/6690299. eCollection 2021.
9
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
10
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

本文引用的文献

1
PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models.PDNAPred:基于预先训练的蛋白质语言模型的蛋白质-DNA 结合位点的可解释预测。
Int J Biol Macromol. 2024 Nov;281(Pt 2):136147. doi: 10.1016/j.ijbiomac.2024.136147. Epub 2024 Oct 1.
2
PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model.PepNet:一种基于预训练蛋白质语言模型的可解释神经网络,用于预测抗炎和抗菌肽。
Commun Biol. 2024 Sep 28;7(1):1198. doi: 10.1038/s42003-024-06911-1.
3
Protein Language Models and Machine Learning Facilitate the Identification of Antimicrobial Peptides.
蛋白质语言模型和机器学习有助于识别抗菌肽。
Int J Mol Sci. 2024 Aug 14;25(16):8851. doi: 10.3390/ijms25168851.
4
PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks.PreDBP-PLMs:基于预训练蛋白质语言模型和卷积神经网络的DNA结合蛋白预测
Anal Biochem. 2024 Nov;694:115603. doi: 10.1016/j.ab.2024.115603. Epub 2024 Jul 8.
5
EnsembleDL-ATG: Identifying autophagy proteins by integrating their sequence and evolutionary information using an ensemble deep learning framework.集成深度学习自噬蛋白识别方法(EnsembleDL-ATG):使用集成深度学习框架整合自噬蛋白的序列和进化信息来识别自噬蛋白。
Comput Struct Biotechnol J. 2023 Sep 29;21:4836-4848. doi: 10.1016/j.csbj.2023.09.036. eCollection 2023.
6
pLM4ACE: A protein language model based predictor for antihypertensive peptide screening.pLM4ACE:一种基于蛋白质语言模型的降压肽筛选预测器。
Food Chem. 2024 Jan 15;431:137162. doi: 10.1016/j.foodchem.2023.137162. Epub 2023 Aug 14.
7
PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model.PLPMpro:基于预训练语言模型的提示学习增强启动子序列预测。
Comput Biol Med. 2023 Sep;164:107260. doi: 10.1016/j.compbiomed.2023.107260. Epub 2023 Jul 21.
8
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction.深度学习技术在蛋白质功能预测中的综合研究
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2291-2301. doi: 10.1109/TCBB.2023.3247634. Epub 2023 Jun 5.
9
Survey of Protein Sequence Embedding Models.蛋白质序列嵌入模型调查。
Int J Mol Sci. 2023 Feb 14;24(4):3775. doi: 10.3390/ijms24043775.
10
The applications of deep learning algorithms on in silico druggable proteins identification.深度学习算法在虚拟可成药蛋白识别中的应用。
J Adv Res. 2022 Nov;41:219-231. doi: 10.1016/j.jare.2022.01.009. Epub 2022 Jan 22.