Keich Uri, Noble William Stafford
School of Mathematics and Statistics F07, University of Sydney.
Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington.
Res Comput Mol Biol. 2017 May;10229:99-116. doi: 10.1007/978-3-319-56970-3_7. Epub 2017 Apr 12.
Estimating the false discovery rate (FDR) among a list of tandem mass spectrum identifications is mostly done through target-decoy competition (TDC). Here we offer two new methods that can use an arbitrarily small number of additional randomly drawn decoy databases to improve TDC. Specifically, "Partial Calibration" utilizes a new meta-scoring scheme that allows us to gradually benefit from the increase in the number of identifications calibration yields and "Averaged TDC" (a-TDC) reduces the liberal bias of TDC for small FDR values and its variability throughout. Combining a-TDC with "Progressive Calibration" (PC), which attempts to find the "right" number of decoys required for calibration we see substantial impact in real datasets: when analyzing the data it typically yields almost the entire 17% increase in discoveries that "full calibration" yields (at FDR level 0.05) using 60 times fewer decoys. Our methods are further validated using a novel realistic simulation scheme and importantly, they apply more generally to the problem of controlling the FDR among discoveries from searching an incomplete database.
在串联质谱鉴定列表中估计错误发现率(FDR)大多是通过目标-诱饵竞争(TDC)来完成的。在此,我们提供了两种新方法,它们可以使用任意少量额外随机抽取的诱饵数据库来改进TDC。具体而言,“部分校准”利用了一种新的元评分方案,使我们能够逐步从校准所产生的鉴定数量增加中受益,并且“平均TDC”(a-TDC)减少了TDC对于小FDR值的宽松偏差及其整体变异性。将a-TDC与“渐进校准”(PC)相结合,PC试图找到校准所需的“正确”诱饵数量,我们在真实数据集中看到了显著影响:在分析数据时,它通常使用少60倍的诱饵就能产生几乎与“完全校准”(在FDR水平为0.05时)所产生的发现数量几乎整整17%的增长。我们的方法通过一种新颖的现实模拟方案得到了进一步验证,重要的是,它们更广泛地适用于控制从不完整数据库搜索中发现的FDR问题。