Keller Andrew, Nesvizhskii Alexey I, Kolker Eugene, Aebersold Ruedi
Institute for Systems Biology, Seattle, Washington 98103, USA.
Anal Chem. 2002 Oct 15;74(20):5383-92. doi: 10.1021/ac025747h.
We present a statistical model to estimate the accuracy of peptide assignments to tandem mass (MS/MS) spectra made by database search applications such as SEQUEST. Employing the expectation maximization algorithm, the analysis learns to distinguish correct from incorrect database search results, computing probabilities that peptide assignments to spectra are correct based upon database search scores and the number of tryptic termini of peptides. Using SEQUEST search results for spectra generated from a sample of known protein components, we demonstrate that the computed probabilities are accurate and have high power to discriminate between correctly and incorrectly assigned peptides. This analysis makes it possible to filter large volumes of MS/MS database search results with predictable false identification error rates and can serve as a common standard by which the results of different research groups are compared.
我们提出了一种统计模型,用于估计通过诸如SEQUEST等数据库搜索应用程序对串联质谱(MS/MS)谱进行肽段匹配的准确性。该分析采用期望最大化算法,学会区分数据库搜索结果的正确与错误,根据数据库搜索得分和肽段的胰蛋白酶切端数量计算肽段与谱图匹配正确的概率。利用从已知蛋白质成分样本生成的谱图的SEQUEST搜索结果,我们证明计算出的概率是准确的,并且具有很高的能力来区分正确和错误匹配的肽段。这种分析使得能够以可预测的错误识别率过滤大量的MS/MS数据库搜索结果,并可作为比较不同研究小组结果的通用标准。