Fan Wenwen, Xu Xiaoyi, Shen Yi, Feng Huanqing, Li Ao, Wang Minghui
School of Information Science and Technology, University of Science and Technology of China, 443 Huangshan Road, Hefei, 230027, China,
Amino Acids. 2014 Apr;46(4):1069-78. doi: 10.1007/s00726-014-1669-3. Epub 2014 Jan 23.
Reversible protein phosphorylation is one of the most important post-translational modifications, which regulates various biological cellular processes. Identification of the kinase-specific phosphorylation sites is helpful for understanding the phosphorylation mechanism and regulation processes. Although a number of computational approaches have been developed, currently few studies are concerned about hierarchical structures of kinases, and most of the existing tools use only local sequence information to construct predictive models. In this work, we conduct a systematic and hierarchy-specific investigation of protein phosphorylation site prediction in which protein kinases are clustered into hierarchical structures with four levels including kinase, subfamily, family and group. To enhance phosphorylation site prediction at all hierarchical levels, functional information of proteins, including gene ontology (GO) and protein-protein interaction (PPI), is adopted in addition to primary sequence to construct prediction models based on random forest. Analysis of selected GO and PPI features shows that functional information is critical in determining protein phosphorylation sites for every hierarchical level. Furthermore, the prediction results of Phospho.ELM and additional testing dataset demonstrate that the proposed method remarkably outperforms existing phosphorylation prediction methods at all hierarchical levels. The proposed method is freely available at http://bioinformatics.ustc.edu.cn/phos_pred/.
可逆蛋白磷酸化是最重要的翻译后修饰之一,它调节各种生物细胞过程。识别激酶特异性磷酸化位点有助于理解磷酸化机制和调控过程。尽管已经开发了许多计算方法,但目前很少有研究关注激酶的层次结构,并且大多数现有工具仅使用局部序列信息来构建预测模型。在这项工作中,我们对蛋白质磷酸化位点预测进行了系统的、特定层次的研究,其中蛋白激酶被聚类为具有激酶、亚家族、家族和组四个层次的层次结构。为了在所有层次水平上增强磷酸化位点预测,除了一级序列外,还采用了蛋白质的功能信息,包括基因本体(GO)和蛋白质-蛋白质相互作用(PPI),以构建基于随机森林的预测模型。对选定的GO和PPI特征的分析表明,功能信息对于确定每个层次水平的蛋白质磷酸化位点至关重要。此外,Phospho.ELM和额外测试数据集的预测结果表明,所提出的方法在所有层次水平上均显著优于现有的磷酸化预测方法。所提出的方法可在http://bioinformatics.ustc.edu.cn/phos_pred/免费获取。