Lu Bingwen, Ruse Cristian I, Yates John R
Department of Chemical Physiology, SR-11, The Scripps Research Institute, La Jolla, CA 92037, USA.
J Proteome Res. 2008 Aug;7(8):3628-34. doi: 10.1021/pr8001194. Epub 2008 Jun 19.
We developed a probability-based machine-learning program, Colander, to identify tandem mass spectra that are highly likely to represent phosphopeptides prior to database search. We identified statistically significant diagnostic features of phosphopeptide tandem mass spectra based on ion trap CID MS/MS experiments. Statistics for the features are calculated from 376 validated phosphopeptide spectra and 376 nonphosphopeptide spectra. A probability-based support vector machine (SVM) program, Colander, was then trained on five selected features. Data sets were assembled both from LC/LC-MS/MS analyses of large-scale phosphopeptide enrichments from proteolyzed cells, tissues and synthetic phosphopeptides. These data sets were used to evaluate the capability of Colander to select pS/pT-containing phosphopeptide tandem mass spectra. When applied to unknown tandem mass spectra, Colander can routinely remove 80% of tandem mass spectra while retaining 95% of phosphopeptide tandem mass spectra. The program significantly reduced computational time spent on database search by 60-90%. Furthermore, prefiltering tandem mass spectra representing phosphopeptides can increase the number of phosphopeptide identifications under a predefined false positive rate.
我们开发了一种基于概率的机器学习程序Colander,用于在数据库搜索之前识别极有可能代表磷酸化肽段的串联质谱图。我们基于离子阱CID MS/MS实验确定了磷酸化肽段串联质谱图具有统计学意义的诊断特征。这些特征的统计数据是根据376个经过验证的磷酸化肽段谱图和376个非磷酸化肽段谱图计算得出的。然后,基于概率的支持向量机(SVM)程序Colander在五个选定的特征上进行了训练。数据集既来自对蛋白水解的细胞、组织和合成磷酸化肽段进行大规模磷酸化肽段富集的LC/LC-MS/MS分析。这些数据集用于评估Colander选择含pS/pT磷酸化肽段串联质谱图的能力。当应用于未知的串联质谱图时,Colander通常可以去除80%的串联质谱图,同时保留95%的磷酸化肽段串联质谱图。该程序显著减少了数据库搜索所花费的计算时间,减少了60%至90%。此外,对代表磷酸化肽段的串联质谱图进行预过滤可以在预定义的假阳性率下增加磷酸化肽段鉴定的数量。