Suppr超能文献

PLM-ATG:通过将蛋白质语言模型嵌入与基于位置特异性得分矩阵的特征相结合来鉴定自噬蛋白

PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features.

作者信息

Wang Yangying, Wang Chunhua

机构信息

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.

出版信息

Molecules. 2025 Apr 10;30(8):1704. doi: 10.3390/molecules30081704.

Abstract

Autophagy critically regulates cellular development while maintaining pathophysiological homeostasis. Since the autophagic process is tightly regulated by the coordination of autophagy-related proteins (ATGs), precise identification of these proteins is essential. Although current computational approaches have addressed experimental recognition's costly and time-consuming challenges, they still have room for improvement since handcrafted features inadequately capture the intricate patterns and relationships hidden in sequences. In this study, we propose PLM-ATG, a novel computational model that integrates support vector machines with the fusion of protein language model (PLM) embeddings and position-specific scoring matrix (PSSM)-based features for the ATG identification. First, we extracted sequence-based features and PSSM-based features as the inputs of six classifiers to establish baseline models. Among these, the combination of the SVM classifier and the AADP-PSSM feature set achieved the best prediction accuracy. Second, two popular PLM embeddings, i.e., ESM-2 and ProtT5, were fused with the AADP-PSSM features to further improve the prediction of ATGs. Third, we selected the optimal feature subset from the combination of the ESM-2 embeddings and AADP-PSSM features to train the final SVM model. The proposed PLM-ATG achieved an accuracy of 99.5% and an MCC of 0.990, which are nearly 5% and 0.1 higher than those of the state-of-the-art model EnsembleDL-ATG, respectively.

摘要

自噬在维持病理生理稳态的同时,对细胞发育起着关键的调节作用。由于自噬过程受到自噬相关蛋白(ATG)协同作用的严格调控,因此精确识别这些蛋白至关重要。尽管目前的计算方法解决了实验识别成本高和耗时的挑战,但由于手工制作的特征无法充分捕捉序列中隐藏的复杂模式和关系,它们仍有改进的空间。在本研究中,我们提出了PLM-ATG,这是一种新型计算模型,它将支持向量机与基于蛋白质语言模型(PLM)嵌入和特定位置评分矩阵(PSSM)特征的融合相结合,用于ATG识别。首先,我们提取基于序列的特征和基于PSSM的特征作为六个分类器输入,以建立基线模型。其中,支持向量机分类器和AADP-PSSM特征集的组合实现了最佳预测准确率。其次,将两种流行的PLM嵌入,即ESM-2和ProtT5,与AADP-PSSM特征融合,以进一步提高ATG的预测能力。第三,我们从ESM-2嵌入和AADP-PSSM特征的组合中选择最优特征子集,以训练最终的支持向量机模型。所提出的PLM-ATG的准确率达到了99.5%,马修斯相关系数为0.990,分别比最先进的模型EnsembleDL-ATG高出近5%和0.1。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd1f/12029579/b23a3685fce0/molecules-30-01704-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验