Genome-Scale Biology Program, Institute of Biomedicine, University of Helsinki, Helsinki, Finland.
Proteomics. 2010 Oct;10(19):3515-24. doi: 10.1002/pmic.200900727.
MS/MS is a widely used method for proteome-wide analysis of protein expression and PTMs. The thousands of MS/MS spectra produced from a single experiment pose a major challenge for downstream analysis. Standard programs, such as MASCOT, provide peptide assignments for many of the spectra, including identification of PTM sites, but these results are plagued by false-positive identifications. In phosphoproteomic experiments, only a single peptide assignment is typically available to support identification of each phosphorylation site, and hence minimizing false positives is critical. Thus, tedious manual validation is often required to increase confidence in the spectral assignments. We have developed phoMSVal, an open-source platform for managing MS/MS data and automatically validating identified phosphopeptides. We tested five classification algorithms with 17 extracted features to separate correct peptide assignments from incorrect ones using over 2600 manually curated spectra. The naïve Bayes algorithm was among the best classifiers with an AUC value of 97% and PPV of 97% for phosphotyrosine data. This classifier required only three features to achieve a 76% decrease in false positives as compared with MASCOT while retaining 97% of true positives. This algorithm was able to classify an independent phosphoserine/threonine data set with AUC value of 93% and PPV of 91%, demonstrating the applicability of this method for all types of phospho-MS/MS data. PhoMSVal is available at http://csbi.ltdk.helsinki.fi/phomsval.
MS/MS 是一种广泛用于蛋白质表达和 PTM 全蛋白质组分析的方法。从单个实验中产生的数千个 MS/MS 谱对下游分析构成了重大挑战。标准程序(如 MASCOT)为许多谱提供了肽分配,包括 PTM 位点的鉴定,但这些结果受到假阳性鉴定的困扰。在磷酸蛋白质组实验中,通常只有一个肽分配可用于支持每个磷酸化位点的鉴定,因此最小化假阳性至关重要。因此,通常需要繁琐的手动验证来提高对光谱分配的信心。我们开发了 phoMSVal,这是一个用于管理 MS/MS 数据和自动验证鉴定的磷酸肽的开源平台。我们使用超过 2600 个经过手工整理的光谱,用 17 个提取特征测试了五个分类算法,以将正确的肽分配与错误的肽分配区分开来。朴素贝叶斯算法是最好的分类器之一,其 AUC 值为 97%,磷酸酪氨酸数据的 PPV 为 97%。与 MASCOT 相比,这种分类器仅需要三个特征即可将假阳性率降低 76%,同时保留 97%的真阳性率。该算法能够对独立的磷酸丝氨酸/苏氨酸数据集进行分类,AUC 值为 93%,PPV 值为 91%,证明了该方法适用于所有类型的磷酸-MS/MS 数据。PhoMSVal 可在 http://csbi.ltdk.helsinki.fi/phomsval 上获得。