Zhang Liyuan, Deng Tingzhi, Pan Shuijing, Zhang Minghui, Zhang Yusen, Yang Chunhua, Yang Xiaoyong, Tian Geng, Mi Jia
Shandong Technology Innovation Center of Molecular Targeting and Intelligent Diagnosis and Treatment, Binzhou Medical University, Yantai, Shandong, China.
National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
Front Cell Dev Biol. 2024 Oct 10;12:1456728. doi: 10.3389/fcell.2024.1456728. eCollection 2024.
Protein O-GlcNAcylation is a dynamic post-translational modification involved in major cellular processes and associated with many human diseases. Bioinformatic prediction of O-GlcNAc sites before experimental validation is a challenge task in O-GlcNAc research. Recent advancements in deep learning algorithms and the availability of O-GlcNAc proteomics data present an opportunity to improve O-GlcNAc site prediction.
This study aims to develop a deep learning-based tool to improve O-GlcNAcylation site prediction.
We construct an annotated unbalanced O-GlcNAcylation data set and propose a new deep learning framework, DeepO-GlcNAc, using Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) combined with attention mechanism.
The ablation study confirms that the additional model components in DeepO-GlcNAc, such as attention mechanisms and LSTM, contribute positively to improving prediction performance. Our model demonstrates strong robustness across five cross-species datasets, excluding humans. We also compare our model with three external predictors using an independent dataset. Our results demonstrated that DeepO-GlcNAc outperforms the external predictors, achieving an accuracy of 92%, an average precision of 72%, a MCC of 0.60, and an AUC of 92% in ROC analysis. Moreover, we have implemented DeepO-GlcNAc as a web server to facilitate further investigation and usage by the scientific community.
Our work demonstrates the feasibility of utilizing deep learning for O-GlcNAc site prediction and provides a novel tool for O-GlcNAc investigation.
蛋白质O-连接的N-乙酰葡糖胺化是一种动态的翻译后修饰,参与主要细胞过程并与多种人类疾病相关。在实验验证之前对O-连接的N-乙酰葡糖胺位点进行生物信息学预测是O-连接的N-乙酰葡糖胺研究中的一项具有挑战性的任务。深度学习算法的最新进展以及O-连接的N-乙酰葡糖胺蛋白质组学数据的可用性为改进O-连接的N-乙酰葡糖胺位点预测提供了机会。
本研究旨在开发一种基于深度学习的工具,以改进O-连接的N-乙酰葡糖胺化位点预测。
我们构建了一个带注释的不平衡O-连接的N-乙酰葡糖胺化数据集,并提出了一种新的深度学习框架DeepO-GlcNAc,它使用长短期记忆(LSTM)、卷积神经网络(CNN)并结合注意力机制。
消融研究证实,DeepO-GlcNAc中的附加模型组件,如注意力机制和LSTM,对提高预测性能有积极贡献。我们的模型在五个跨物种数据集(不包括人类)上表现出很强的稳健性。我们还使用独立数据集将我们的模型与三个外部预测器进行了比较。我们的结果表明,DeepO-GlcNAc优于外部预测器,在ROC分析中准确率达到92%,平均精度为72%,MCC为0.60,AUC为92%。此外,我们已将DeepO-GlcNAc实现为一个网络服务器,以方便科学界进一步研究和使用。
我们的工作证明了利用深度学习进行O-连接的N-乙酰葡糖胺位点预测的可行性,并为O-连接的N-乙酰葡糖胺研究提供了一种新工具。