Johns Hopkins Bayview Proteomics Center, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
Proteomics. 2010 Mar;10(6):1160-71. doi: 10.1002/pmic.200900433.
With the proliferation of search engines for the analysis of MS data, multisearch techniques aimed at boosting the discriminating power of the search engines' score functions have recently become popular. Much statistical and algorithmic work has been done, therefore, in order to be able to combine and parse multiple search streams. However, multisearch techniques suffer from long run times, and may have little impact on false negatives because of similar peptide filtering heuristics between searches. This review focuses, rather, on multipass techniques, which use the results of one search to guide the selection of spectra, parameters and sequences in subsequent searches. This reduces the number of false-negative peptide identifications due to peptide candidate filtering while preserving statistical significance of existing (correct) identifications. Furthermore, this technique avoids substantial increases in running time and, by limiting the search space, does not reduce the statistical significance of correct identifications or introduce a statistically significant number of false-positive identifications. However, we argue that the existing combiner tools are not reliably applicable to these multipass situations, because of algorithmic assumptions about search space and statistical assumptions about the rate of true positives. Here we provide an overview of the advantages of and issues in multipass analysis techniques, the existing methods and workflows available to proteomic researchers, and the unsolved statistical and algorithmic issues amenable to future research.
随着用于分析 MS 数据的搜索引擎的激增,旨在提高搜索引擎评分函数判别能力的多搜索技术最近变得流行起来。因此,为了能够组合和解析多个搜索流,已经完成了大量的统计和算法工作。但是,多搜索技术运行时间长,并且由于搜索之间的类似肽过滤启发式,可能对假阴性的影响不大。本综述侧重于多遍技术,该技术使用一次搜索的结果来指导后续搜索中光谱、参数和序列的选择。这减少了由于肽候选过滤而导致的假阴性肽鉴定数量,同时保留了现有(正确)鉴定的统计显着性。此外,该技术避免了运行时间的大幅增加,并且通过限制搜索空间,不会降低正确鉴定的统计显着性或引入大量假阳性鉴定。然而,我们认为现有的组合工具不适用于这些多遍情况,因为它们对搜索空间的算法假设和对真实阳性率的统计假设。在这里,我们提供了多遍分析技术的优势和问题、蛋白质组学研究人员可用的现有方法和工作流程以及可用于未来研究的未解决的统计和算法问题的概述。