Chemical and Biological Signatures, Pacific Northwest National Laboratory, Seattle, Washington 98109, United States.
School of Mathematics & Statistics, University of Sydney, New South Wales, 2006, Australia.
J Proteome Res. 2022 Oct 7;21(10):2412-2420. doi: 10.1021/acs.jproteome.2c00282. Epub 2022 Sep 27.
The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.
蛋白质组学数据的分析通常涉及生成推断的肽-谱匹配(PSM)和/或肽列表。生成这些发现列表的典型方法是通过控制假发现率(FDR),最常用的方法是通过目标诱饵竞争(TDC)。在 PSM 水平上,通过将每个光谱的最佳得分目标(真实)肽匹配与其对诱饵数据库的最佳匹配进行竞争来实现 TDC。通过在 FDR 估计之前选择每个肽的得分最高的 PSM,可以将这种 PSM 水平的程序适应于肽水平。在这里,我们首先突出并经验性地扩展了 He 等人之前鲜为人知的一项工作,该工作表明基于 TDC 的 PSM 水平 FDR 估计可能存在很大偏差。因此,我们建议研究人员转而专注于肽水平分析。然后,我们研究了三种执行肽水平 TDC 的方法,并表明最常见的方法(“仅 PSM”)在实践中提供的统计能力最低。一种替代方法是首先在 PSM 级别然后在肽级别执行双重竞争(“PSM-and-peptide”),这是最强大的方法,与仅 PSM 方法相比,在 1% FDR 阈值下平均增加了 17%更多的发现肽。