Tabb David L, Fernando Christopher G, Chambers Matthew C
Mass Spectrometry Research Center / Departments of Biomedical Informatics and Biochemistry, Vanderbilt University Medical Center, Nashville, TN 37232-8575, USA.
J Proteome Res. 2007 Feb;6(2):654-61. doi: 10.1021/pr0604054.
Shotgun proteomics experiments are dependent upon database search engines to identify peptides from tandem mass spectra. Many of these algorithms score potential identifications by evaluating the number of fragment ions matched between each peptide sequence and an observed spectrum. These systems, however, generally do not distinguish between matching an intense peak and matching a minor peak. We have developed a statistical model to score peptide matches that is based upon the multivariate hypergeometric distribution. This scorer, part of the "MyriMatch" database search engine, places greater emphasis on matching intense peaks. The probability that the best match for each spectrum has occurred by random chance can be employed to separate correct matches from random ones. We evaluated this software on data sets from three different laboratories employing three different ion trap instruments. Employing a novel system for testing discrimination, we demonstrate that stratifying peaks into multiple intensity classes improves the discrimination of scoring. We compare MyriMatch results to those of Sequest and X!Tandem, revealing that it is capable of higher discrimination than either of these algorithms. When minimal peak filtering is employed, performance plummets for a scoring model that does not stratify matched peaks by intensity. On the other hand, we find that MyriMatch discrimination improves as more peaks are retained in each spectrum. MyriMatch also scales well to tandem mass spectra from high-resolution mass analyzers. These findings may indicate limitations for existing database search scorers that count matched peaks without differentiating them by intensity. This software and source code is available under Mozilla Public License at this URL: http://www.mc.vanderbilt.edu/msrc/bioinformatics/.
鸟枪法蛋白质组学实验依赖于数据库搜索引擎从串联质谱中识别肽段。许多此类算法通过评估每个肽序列与观测光谱之间匹配的碎片离子数量来对潜在的识别结果进行评分。然而,这些系统通常无法区分匹配强峰和匹配弱峰。我们开发了一种基于多元超几何分布的统计模型来对肽段匹配进行评分。这个评分器是“MyriMatch”数据库搜索引擎的一部分,它更加强调匹配强峰。每个光谱的最佳匹配是随机出现的概率可用于区分正确匹配和随机匹配。我们在来自三个不同实验室、使用三种不同离子阱仪器的数据集上评估了该软件。通过采用一种新颖的测试区分度的系统,我们证明将峰分层到多个强度类别可提高评分的区分度。我们将MyriMatch的结果与Sequest和X!Tandem的结果进行比较,发现它比这两种算法中的任何一种都具有更高的区分度。当采用最小峰过滤时,对于不按强度对匹配峰进行分层的评分模型,性能会大幅下降。另一方面,我们发现随着每个光谱中保留的峰更多,MyriMatch的区分度会提高。MyriMatch对来自高分辨率质量分析器的串联质谱也具有良好的扩展性。这些发现可能表明现有数据库搜索评分器存在局限性,它们在计数匹配峰时未按强度进行区分。该软件和源代码可在以下网址根据Mozilla公共许可证获取:http://www.mc.vanderbilt.edu/msrc/bioinformatics/ 。