利用蛋白质语言模型的嵌入来改进蛋白质琥珀酰化位点预测。

Improving protein succinylation sites prediction using embeddings from protein language model.

机构信息

Department of Computer Science, Michigan Technological University, Houghton, MI, USA.

Department of Informatics, Bioinformatics and Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.

出版信息

Sci Rep. 2022 Oct 8;12(1):16933. doi: 10.1038/s41598-022-21366-2.

Protein succinylation is an important post-translational modification (PTM) responsible for many vital metabolic activities in cells, including cellular respiration, regulation, and repair. Here, we present a novel approach that combines features from supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50 (hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation sites. To our knowledge, this is one of the first attempts to employ embedding from a pre-trained protein language model to predict protein succinylation sites. The proposed model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods, with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity, respectively. LMSuccSite is likely to serve as a valuable resource for exploration of succinylation and its role in cellular physiology and disease.

蛋白质琥珀酰化是一种重要的翻译后修饰（PTM），负责细胞中的许多重要代谢活动，包括细胞呼吸、调节和修复。在这里，我们提出了一种新的方法，该方法结合了有监督的词嵌入特征和一种名为 ProtT5-XL-UniRef50（简称 ProtT5）的蛋白质语言模型的嵌入，在深度学习框架中预测蛋白质琥珀酰化位点。据我们所知，这是首次尝试使用预训练的蛋白质语言模型的嵌入来预测蛋白质琥珀酰化位点。所提出的模型，称为 LMSuccSite，与现有方法相比，取得了最先进的结果，MCC、敏感性和特异性的性能得分分别为 0.36、0.79 和 0.79。LMSuccSite 可能成为探索琥珀酰化及其在细胞生理学和疾病中的作用的有价值的资源。