Yu Ziyuan, Yu Jialin, Wang Hongmei, Zhang Shuai, Zhao Long, Shi Shaoping
Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China.
Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang, 330031, China; Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang, 330031, China.
Anal Biochem. 2024 Jul;690:115510. doi: 10.1016/j.ab.2024.115510. Epub 2024 Mar 19.
Phosphorylation is indispensable in comprehending biological processes, while biological experimental methods for identifying phosphorylation sites are tedious and arduous. With the rapid growth of biotechnology, deep learning methods have made significant progress in site prediction tasks. Nevertheless, most existing predictors only consider protein sequence information, that limits the capture of protein spatial information. Building upon the latest advancement in protein structure prediction by AlphaFold2, a novel integrated deep learning architecture PhosAF is developed to predict phosphorylation sites in human proteins by integrating CMA-Net and MFC-Net, which considers sequence and structure information predicted by AlphaFold2. Here, CMA-Net module is composed of multiple convolutional neural network layers and multi-head attention is appended to obtaining the local and long-term dependencies of sequence features. Meanwhile, the MFC-Net module composed of deep neural network layers is used to capture the complex representations of evolutionary and structure features. Furthermore, different features are combined to predict the final phosphorylation sites. In addition, we put forward a new strategy to construct reliable negative samples via protein secondary structures. Experimental results on independent test data and case study indicate that our model PhosAF surpasses the current most advanced methods in phosphorylation site prediction.
磷酸化在理解生物过程中不可或缺,而识别磷酸化位点的生物学实验方法既繁琐又艰巨。随着生物技术的迅速发展,深度学习方法在位点预测任务中取得了显著进展。然而,大多数现有的预测器仅考虑蛋白质序列信息,这限制了对蛋白质空间信息的捕捉。基于AlphaFold2在蛋白质结构预测方面的最新进展,开发了一种新颖的集成深度学习架构PhosAF,通过整合CMA-Net和MFC-Net来预测人类蛋白质中的磷酸化位点,该架构考虑了AlphaFold2预测的序列和结构信息。在此,CMA-Net模块由多个卷积神经网络层组成,并附加多头注意力以获取序列特征的局部和长期依赖性。同时,由深度神经网络层组成的MFC-Net模块用于捕捉进化和结构特征的复杂表示。此外,将不同特征组合起来以预测最终的磷酸化位点。另外,我们提出了一种通过蛋白质二级结构构建可靠负样本的新策略。在独立测试数据上的实验结果和案例研究表明,我们的模型PhosAF在磷酸化位点预测方面优于当前最先进的方法。