Zhang Menghuan, Zhang Yizhi, Dong Keqin, Lin Jin, Cui Xingang, Zhang Yong
State Key Laboratory of Cardiovascular Diseases and Medical Innovation Center, Institute for Regenerative Medicine, Department of Neurosurgery, Shanghai East Hospital, Shanghai Key Laboratory of Signaling and Disease Research, Frontier Science Center for Stem Cell Research, School of Life Sciences and Technology, Tongji University, Shanghai, China.
Department of Urology, School of Medicine, Xinhua Hospital Affiliated to Shanghai Jiao Tong University, Shanghai, China.
Mol Cell Proteomics. 2025 Jan;24(1):100889. doi: 10.1016/j.mcpro.2024.100889. Epub 2024 Nov 30.
Phosphorylation is an indispensable regulatory mechanism in cells, with specific sites on kinases that can significantly enhance their activity. Although several such critical phosphorylation sites (phos-sites) have been experimentally identified, many more remain to be explored. To date, no computational method exists to systematically identify these critical phos-sites on kinases. In this study, we introduce PhoSiteformer, a transformer-inspired foundational model designed to generate embeddings of phos-sites using phosphorylation mass spectrometry data. Recognizing the complementary insights offered by protein sequence data and phosphorylation mass spectrometry data, we developed a classification model, CSPred, which employs a bimodal fusion strategy. CSPred combines embeddings from PhoSiteformer with those from the protein language model ProtT5. Our approach successfully identified 77 critical phos-sites on 58 human kinases. Two of these sites, T517 on PKG1 and T735 on PRKD3, have been experimentally verified. This study presents the first systematic and computational approach to identify critical phos-sites that enhance kinase activity.
磷酸化是细胞中不可或缺的调节机制,激酶上的特定位点可显著增强其活性。尽管已经通过实验确定了几个这样的关键磷酸化位点(磷酸位点),但仍有更多位点有待探索。迄今为止,尚无系统识别激酶上这些关键磷酸位点的计算方法。在本研究中,我们引入了PhoSiteformer,这是一种受Transformer启发的基础模型,旨在利用磷酸化质谱数据生成磷酸位点的嵌入。认识到蛋白质序列数据和磷酸化质谱数据提供的互补见解,我们开发了一种分类模型CSPred,该模型采用双峰融合策略。CSPred将PhoSiteformer的嵌入与蛋白质语言模型ProtT5的嵌入相结合。我们的方法成功识别了58种人类激酶上的77个关键磷酸位点。其中两个位点,PKG1上的T517和PRKD3上的T735,已通过实验验证。本研究提出了第一种系统的计算方法来识别增强激酶活性的关键磷酸位点。