Berry Emily A, Dalby Andrew R, Yang Zheng Rong
Department of Computer Science, School of Engineering, Computer Science and Mathematics, University of Exeter, UK.
Comput Biol Chem. 2004 Feb;28(1):75-85. doi: 10.1016/j.compbiolchem.2003.11.005.
Protein phosphorylation is a post-translational modification performed by a group of enzymes known as the protein kinases or phosphotransferases (Enzyme Commission classification 2.7). It is essential to the correct functioning of both proteins and cells, being involved with enzyme control, cell signalling and apoptosis. The major problem when attempting prediction of these sites is the broad substrate specificity of the enzymes. This study employs back-propagation neural networks (BPNNs), the decision tree algorithm C4.5 and the reduced bio-basis function neural network (rBBFNN) to predict phosphorylation sites. The aim is to compare prediction efficiency of the three algorithms for this problem, and examine knowledge extraction capability. All three algorithms are effective for phosphorylation site prediction. Results indicate that rBBFNN is the fastest and most sensitive of the algorithms. BPNN has the highest area under the ROC curve and is therefore the most robust, and C4.5 has the highest prediction accuracy. C4.5 also reveals the amino acid 2 residues upstream from the phosporylation site is important for serine/threonine phosphorylation, whilst the amino acid 3 residues upstream is important for tyrosine phosphorylation.
蛋白质磷酸化是一种翻译后修饰,由一组称为蛋白激酶或磷酸转移酶的酶(酶委员会分类2.7)执行。它对于蛋白质和细胞的正常功能至关重要,涉及酶的控制、细胞信号传导和细胞凋亡。预测这些位点时的主要问题是酶的广泛底物特异性。本研究采用反向传播神经网络(BPNN)、决策树算法C4.5和简化生物基函数神经网络(rBBFNN)来预测磷酸化位点。目的是比较这三种算法针对该问题的预测效率,并检验知识提取能力。所有三种算法对于磷酸化位点预测都是有效的。结果表明,rBBFNN是这些算法中速度最快且最灵敏的。BPNN的ROC曲线下面积最大,因此最稳健,而C4.5的预测准确率最高。C4.5还揭示了磷酸化位点上游的2个氨基酸残基对于丝氨酸/苏氨酸磷酸化很重要,而上游3个氨基酸残基对于酪氨酸磷酸化很重要。