Lu Bingwen, Ruse Cristian, Xu Tao, Park Sung Kyu, Yates John
Department of Cell Biology, The Scripps Research Institute, La Jolla, California 92037, USA.
Anal Chem. 2007 Feb 15;79(4):1301-10. doi: 10.1021/ac061334v.
We developed and compared two approaches for automated validation of phosphopeptide tandem mass spectra identified using database searching algorithms. Phosphopeptide identifications were obtained through SEQUEST searches of a protein database appended with its decoy (reversed sequences). Statistical evaluation and iterative searches were employed to create a high-quality data set of phosphopeptides. Automation of postsearch validation was approached by two different strategies. By using statistical multiple testing, we calculate a p value for each tentative peptide phosphorylation. In a second method, we use a support vector machine (SVM; a machine learning algorithm) binary classifier to predict whether a tentative peptide phosphorylation is true. We show good agreement (85%) between postsearch validation of phosphopeptide/spectrum matches by multiple testing and that from support vector machines. Automatic methods conform very well with manual expert validation in a blinded test. Additionally, the algorithms were tested on the identification of synthetic phosphopeptides. We show that phosphate neutral losses in tandem mass spectra can be used to assess the correctness of phosphopeptide/spectrum matches. An SVM classifier with a radial basis function provided classification accuracy from 95.7% to 96.8% of the positive data set, depending on search algorithm used. Establishing the efficacy of an identification is a necessary step for further postsearch interrogation of the spectra for complete localization of phosphorylation sites. Our current implementation performs validation of phosphoserine/phosphothreonine-containing peptides having one or two phosphorylation sites from data gathered on an ion trap mass spectrometer. The SVM-based algorithm has been implemented in the software package DeBunker. We illustrate the application of the SVM-based software DeBunker on a large phosphorylation data set.
我们开发并比较了两种用于自动验证通过数据库搜索算法鉴定的磷酸化肽串联质谱的方法。通过对附加了诱饵(反向序列)的蛋白质数据库进行SEQUEST搜索来获得磷酸化肽鉴定结果。采用统计评估和迭代搜索来创建高质量的磷酸化肽数据集。通过两种不同策略实现搜索后验证的自动化。利用统计多重检验,我们为每个暂定的肽磷酸化计算一个p值。在第二种方法中,我们使用支持向量机(SVM;一种机器学习算法)二元分类器来预测暂定的肽磷酸化是否正确。我们发现多重检验和支持向量机对磷酸化肽/谱匹配的搜索后验证之间具有良好的一致性(85%)。在盲测中,自动方法与人工专家验证非常吻合。此外,还对合成磷酸化肽的鉴定进行了算法测试。我们表明,串联质谱中的磷酸盐中性丢失可用于评估磷酸化肽/谱匹配的正确性。根据所使用的搜索算法,具有径向基函数的SVM分类器对阳性数据集的分类准确率为95.7%至96.8%。确定鉴定的有效性是进一步对谱进行搜索后询问以实现磷酸化位点完全定位的必要步骤。我们当前的实现对从离子阱质谱仪收集的数据中含一个或两个磷酸化位点的磷酸丝氨酸/磷酸苏氨酸肽进行验证。基于SVM的算法已在软件包DeBunker中实现。我们展示了基于SVM的软件DeBunker在一个大型磷酸化数据集上的应用。