School of Information Science and Technology, University of Science and Technology of China, Heifei, China.
PeerJ. 2022 Mar 14;10:e12847. doi: 10.7717/peerj.12847. eCollection 2022.
Human DNA sequencing has revealed numerous single nucleotide variants associated with complex diseases. Researchers have shown that these variants have potential effects on protein function, one of which is to disrupt protein phosphorylation. Based on conventional machine learning algorithms, several computational methods for predicting phospho-variants have been developed, but their performance still leaves considerable room for improvement. In recent years, deep learning has been successfully applied in biological sequence analysis with its efficient sequence pattern learning ability, which provides a powerful tool for improving phospho-variant prediction based on protein sequence information. In the study, we present PhosVarDeep, a novel unified deep-learning framework for phospho-variant prediction. PhosVarDeep takes reference and variant sequences as inputs and adopts a Siamese-like CNN architecture containing two identical subnetworks and a prediction module. In each subnetwork, general phosphorylation sequence features are extracted by a pre-trained sequence feature encoding network and then fed into a CNN module for capturing variant-aware phosphorylation sequence features. After that, a prediction module is introduced to integrate the outputs of the two subnetworks and generate the prediction results of phospho-variants. Comprehensive experimental results on phospho-variant data demonstrates that our method significantly improves the prediction performance of phospho-variants and compares favorably with existing conventional machine learning methods.
人类 DNA 测序揭示了许多与复杂疾病相关的单核苷酸变体。研究人员表明,这些变体可能对蛋白质功能产生影响,其中之一是破坏蛋白质磷酸化。基于传统的机器学习算法,已经开发出了几种用于预测磷酸化变体的计算方法,但它们的性能仍有很大的改进空间。近年来,深度学习以其高效的序列模式学习能力成功应用于生物序列分析,为基于蛋白质序列信息的磷酸化变体预测提供了有力的工具。在研究中,我们提出了 PhosVarDeep,这是一种用于磷酸化变体预测的新型统一深度学习框架。PhosVarDeep 将参考序列和变体序列作为输入,并采用类似孪生的 CNN 架构,其中包含两个相同的子网和一个预测模块。在每个子网络中,通过预训练的序列特征编码网络提取一般的磷酸化序列特征,然后将其输入 CNN 模块以捕获变体感知的磷酸化序列特征。之后,引入一个预测模块来整合两个子网的输出,并生成磷酸化变体的预测结果。在磷酸化变体数据上的综合实验结果表明,我们的方法显著提高了磷酸化变体的预测性能,并优于现有的传统机器学习方法。