Max Planck Institute for Molecular Plant Physiology Potsdam-Golm, Brandenburg, Germany.
Front Plant Sci. 2012 Sep 5;3:207. doi: 10.3389/fpls.2012.00207. eCollection 2012.
The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Linial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at http://phosphat.mpimp-golm.mpg.de.
近年来,通过调节富含序列局部磷酸化位点(P 位)的表面电荷状态来调节蛋白质功能引起了越来越多的关注。我们着手确定模式植物拟南芥中的磷酸化热点。我们分析了 PhosPhAt 数据库中可用的拟南芥蛋白序列中肽覆盖区域内实验检测到的 P 位之间的间隔。证实了早期的报告(Schweiger 和 Linial,2010),我们发现,确实,P 位倾向于聚集,并且丝氨酸和苏氨酸 P 位与其各自最近的下一个 P 位之间的分布与酪氨酸 P 位显著不同。通过应用专门用于识别单个 P 位的现有计算 P 位预测程序来预测 P 位热点的能力被观察到严重受到附近 P 位的干扰。我们设计了一种新的方法,称为 HotSPotter,用于预测磷酸化热点。HotSPotter 主要基于局部氨基酸组成偏好,而不是序列位置特异性基序,并使用支持向量机作为底层分类引擎。HotSPotter 以高精度正确识别了拟南芥中实验确定的磷酸化热点。应用于拟南芥蛋白质组,HotSPotter 在 9599 种蛋白质中预测了 13677 个候选 P 位热点,对应于 7847 个独特基因。含有热点的蛋白质主要参与信号转导过程,证实了热点在信号转导和相互作用事件中调节作用的推测。我们的研究为识别磷酸化热点提供了新的生物信息学手段,并为进一步研究新的候选 P 位热点奠定了基础。所有磷酸化热点注释和预测都作为 PhosPhAt 数据库的一部分在 http://phosphat.mpimp-golm.mpg.de 上提供。