Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
Comput Biol Med. 2024 Feb;169:107925. doi: 10.1016/j.compbiomed.2024.107925. Epub 2024 Jan 3.
Serine phosphorylation plays a pivotal role in the pathogenesis of various cellular processes and diseases. Roughly 81% of human diseases have links to phosphorylation, and an overwhelming 86.4% of protein phosphorylation takes place at serine residues. In eukaryotes, over a quarter of proteins undergo phosphorylation, with more than half implicated in numerous disorders, notably cancer and reproductive system diseases. This study primarily focuses on serine-phosphorylation-driven pathogenesis and the critical role of conserved motif identification. While numerous techniques exist for predicting serine phosphorylation sites, traditional wet lab experiments are resource-intensive. Our paper introduces a cutting-edge deep learning tool for predicting S phosphorylation sites, integrating explainable AI for motif identification, a transformer language model, and deep neural network components. We trained our model on protein sequences from UniProt, validated it against the dbPTM benchmark dataset, and employed the PTMD dataset to explore motifs related to mammalian disorders. Our results highlight that our model surpasses other deep learning predictors by a significant 3%. Furthermore, we utilized the local interpretable model-agnostic explanations (LIME) approach to shed light on the predictions, emphasizing the amino acid residues crucial for S phosphorylation. Notably, our model also outperformed competitors in kinase-specific serine phosphorylation prediction on benchmark datasets.
丝氨酸磷酸化在各种细胞过程和疾病的发病机制中起着关键作用。大约 81%的人类疾病与磷酸化有关,而压倒性的 86.4%的蛋白质磷酸化发生在丝氨酸残基上。在真核生物中,超过四分之一的蛋白质经历磷酸化,其中一半以上与许多疾病有关,特别是癌症和生殖系统疾病。本研究主要关注丝氨酸磷酸化驱动的发病机制和保守基序识别的关键作用。虽然有许多技术可用于预测丝氨酸磷酸化位点,但传统的湿实验室实验需要大量资源。我们的论文介绍了一种用于预测 S 磷酸化位点的前沿深度学习工具,该工具集成了可解释人工智能用于基序识别、变压器语言模型和深度神经网络组件。我们在 UniProt 的蛋白质序列上训练了我们的模型,并用 dbPTM 基准数据集进行了验证,并使用 PTMD 数据集探索与哺乳动物疾病相关的基序。我们的结果表明,我们的模型比其他深度学习预测器高出 3%。此外,我们还使用局部可解释模型不可知解释 (LIME) 方法来解释预测结果,强调了对 S 磷酸化至关重要的氨基酸残基。值得注意的是,我们的模型在基准数据集上的激酶特异性丝氨酸磷酸化预测中也优于竞争对手。