Han Ye, He Fei, Shao Qing, Wang Duolin, Xu Dong
Department of Electrical Engineering and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
Chemical & Materials Engineering, University of Kentucky, Lexington, KY 40506, USA.
Biomolecules. 2025 Jun 9;15(6):843. doi: 10.3390/biom15060843.
Post-translational modifications (PTMs) regulate protein function, stability, and interactions, playing essential roles in cellular signaling, localization, and disease mechanisms. Computational approaches enable scalable PTM site prediction; however, traditional models focus only on local sequence features from fragments around potential modification sites, limiting the scope of their predictions. Recently, pre-trained protein language models (PLMs) have improved PTM prediction by leveraging biological knowledge derived from extensive protein databases. However, most PLMs used for PTM site prediction are pre-trained solely on amino acid sequences, limiting their ability to capture the structural context necessary for accurate PTM site prediction. Moreover, these methods typically train separate single-task models for each PTM type, which hinders the sharing of common features and limits potential knowledge transfer across tasks. To overcome these limitations, we introduce MTPrompt-PTM, a multi-task PTM prediction framework developed by applying prompt tuning to a structure-aware protein language model (S-PLM). Instead of training several single-task models, MTPrompt-PTM trains one multi-task model to predict multiple types of PTM sites using shared feature extraction layers and task-specific classification heads. Additionally, we incorporate a knowledge distillation strategy to enhance the efficiency and generalizability of multi-task training. Experimental results demonstrate that MTPrompt-PTM outperforms state-of-the-art PTM prediction tools on 13 types of PTM sites, highlighting the advantages of multi-task learning and structural integration.
翻译后修饰(PTM)调节蛋白质功能、稳定性和相互作用,在细胞信号传导、定位和疾病机制中发挥着重要作用。计算方法能够实现可扩展的PTM位点预测;然而,传统模型仅关注潜在修饰位点周围片段的局部序列特征,限制了其预测范围。最近,预训练蛋白质语言模型(PLM)通过利用从广泛蛋白质数据库中获得的生物学知识改进了PTM预测。然而,大多数用于PTM位点预测的PLM仅在氨基酸序列上进行预训练,限制了它们捕获准确PTM位点预测所需结构上下文的能力。此外,这些方法通常为每种PTM类型训练单独的单任务模型,这阻碍了共同特征的共享,并限制了跨任务的潜在知识转移。为了克服这些限制,我们引入了MTPrompt-PTM,这是一个通过对结构感知蛋白质语言模型(S-PLM)应用提示调整而开发的多任务PTM预测框架。MTPrompt-PTM不是训练几个单任务模型,而是训练一个多任务模型,使用共享特征提取层和特定任务分类头来预测多种类型的PTM位点。此外,我们采用了知识蒸馏策略来提高多任务训练的效率和泛化能力。实验结果表明,MTPrompt-PTM在13种PTM位点上优于现有最先进的PTM预测工具,突出了多任务学习和结构整合的优势。