Computer Science Department, Rochester Institute of Technology, Rochester, NY, USA.
North Carolina A&T State University, Computational Data Science and Engineering, Greensboro, NC, USA.
Methods Mol Biol. 2025;2867:261-297. doi: 10.1007/978-1-0716-4196-5_16.
Protein post-translational modifications (PTMs) introduce new functionalities and play a critical role in the regulation of protein functions. Characterizing these modifications, especially PTM sites, is essential for unraveling complex biological systems. However, traditional experimental approaches, such as mass spectrometry, are time-consuming and expensive. Machine learning and deep learning techniques offer promising alternatives for predicting PTM sites. In this chapter, we introduce our LMPTMSite (language model-based post-translational modification site predictor) platform, which emphasizes two transformer-based protein language model (pLM) approaches: pLMSNOSite and LMSuccSite, for the prediction of S-nitrosylation sites and succinylation sites in proteins, respectively. We highlight the various methods of using pLM-based sequence encoding, explain the underlying deep learning architectures, and discuss the superior efficacy of these tools compared to other state-of-the-art tools. Subsequently, we present an analysis of runtime and memory usage for pLMSNOSite, with a focus on CPU and RAM usage as the input sequence length is scaled up. Finally, we showcase a case study predicting succinylation sites in proteins active within the tricarboxylic acid (TCA) cycle pathway using LMSuccSite, demonstrating its potential utility and efficiency in real-world biological contexts. The LMPTMSite platform, inclusive of pLMSNOSite and LMSuccSite, is freely available both as a web server ( http://kcdukkalab.org/pLMSNOSite/ and http://kcdukkalab.org/LMSuccSite/ ) and as standalone packages ( https://github.com/KCLabMTU/pLMSNOSite and https://github.com/KCLabMTU/LMSuccSite ), providing valuable tools for researchers in the field.
蛋白质翻译后修饰(PTMs)引入新的功能,并在蛋白质功能调控中发挥关键作用。这些修饰的特征,尤其是翻译后修饰位点的特征,对于揭示复杂的生物系统至关重要。然而,传统的实验方法,如质谱法,既耗时又昂贵。机器学习和深度学习技术为预测翻译后修饰位点提供了有前途的替代方案。在本章中,我们介绍了我们的 LMPTMSite(基于语言模型的翻译后修饰位点预测器)平台,该平台强调了两种基于变压器的蛋白质语言模型(pLM)方法:pLMSNOSite 和 LMSuccSite,分别用于预测蛋白质中的 S-亚硝化位点和琥珀酰化位点。我们强调了基于 pLM 的序列编码的各种方法,解释了基础的深度学习架构,并讨论了这些工具与其他最先进的工具相比的优越效果。随后,我们对 pLMSNOSite 的运行时和内存使用情况进行了分析,重点关注随着输入序列长度的增加 CPU 和 RAM 的使用情况。最后,我们展示了一个案例研究,使用 LMSuccSite 预测三羧酸(TCA)循环途径中活跃的蛋白质中的琥珀酰化位点,展示了它在实际生物背景下的潜在实用性和效率。LMPTMSite 平台,包括 pLMSNOSite 和 LMSuccSite,既可以作为网络服务器(http://kcdukkalab.org/pLMSNOSite/ 和 http://kcdukkalab.org/LMSuccSite/),也可以作为独立的软件包(https://github.com/KCLabMTU/pLMSNOSite 和 https://github.com/KCLabMTU/LMSuccSite)免费使用,为该领域的研究人员提供了有价值的工具。