Keich Uri, Kertesz-Farkas Attila, Noble William Stafford
†School of Mathematics and Statistics F07, University of Sydney, Sydney, New South Wales 2006, Australia.
‡Department of Genome Sciences, University of Washington, Foege Building S220B, 3720 15th Avenue North East, Seattle, Washington 98195-5065, United States.
J Proteome Res. 2015 Aug 7;14(8):3148-61. doi: 10.1021/acs.jproteome.5b00081. Epub 2015 Jul 27.
Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.
解读鸟枪法蛋白质组学实验中可能产生的大量假设,需要一个有效且准确的程序,用于为已识别的串联质谱分配统计置信度估计值。尽管此类程序在大多数高通量蛋白质组学实验中起着关键作用,但科学文献尚未就最佳置信度估计方法达成共识。在这项工作中,我们通过理论和实证分析,评估了四种先前提出的用于估计与一组已识别串联质谱相关的错误发现率(FDR)的方案:Elias和Gygi的目标-诱饵竞争方案(TDC)的两种变体,以及Käll等人的单独目标-诱饵搜索方案的两种变体。我们的分析揭示了两种单独目标-诱饵搜索方案中存在显著偏差。此外,在目标肽段谱匹配(PSM)中提供无偏FDR估计的一种TDC方案,是以放弃一部分高分谱图识别结果的随机子集为代价的。因此,我们提出了混合最大程序,以便在存在校准良好的分数时提供无偏、准确的FDR估计。该方法避免了与两种单独目标-诱饵搜索方案相关的偏差,也避免了目标-诱饵竞争丢弃一部分高分目标识别结果的随机子集的倾向。