使用每个谱图的候选肽进行错误发现率估计。

False discovery rate estimation using candidate peptides for each spectrum.

机构信息

Department of Computer Science, Hanyang University, Seoul, 06978, Republic of Korea.

Biomedical Informatics Team, Korea Institute of Science and Technology Information, Daejeon, 34141, Republic of Korea.

出版信息

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

DOI:10.1186/s12859-022-05002-4

PMID:36319948

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9623924/

Abstract

BACKGROUND

False discovery rate (FDR) estimation is very important in proteomics. The target-decoy strategy (TDS), which is often used for FDR estimation, estimates the FDR under the assumption that when spectra are identified incorrectly, the probabilities of the spectra matching the target or decoy peptides are identical. However, no spectra matching target or decoy peptide probabilities are identical. We propose cTDS (target-decoy strategy with candidate peptides) for accurate estimation of the FDR using the probability that the spectrum is identified incorrectly as a target or decoy peptide.

RESULTS

Most spectrum cases result in a probability of having the spectrum identified incorrectly as a target or decoy peptide of close to 0.5, but only about 1.14-4.85% of the total spectra have an exact probability of 0.5. We used an entrapment sequence method to demonstrate the accuracy of cTDS. For fixed FDR thresholds (1-10%), the false match rate (FMR) in cTDS is closer than the FMR in TDS. We compared the number of peptide-spectrum matches (PSMs) obtained with TDS and cTDS at a 1% FDR threshold with the HEK293 dataset. In the first and third replications, the number of PSMs obtained with cTDS for the reverse, pseudo-reverse, shuffle, and de Bruijn databases exceeded those obtained with TDS (about 0.001-0.132%), with the pseudo-shuffle database containing less compared to TDS (about 0.05-0.126%). In the second replication, the number of PSMs obtained with cTDS for all databases exceeds that obtained with TDS (about 0.013-0.274%).

CONCLUSIONS

When spectra are actually identified incorrectly, most probabilities of the spectra matching a target or decoy peptide are not identical. Therefore, we propose cTDS, which estimates the FDR more accurately using the probability of the spectrum being identified incorrectly as a target or decoy peptide.

摘要

背景

错误发现率（FDR）估计在蛋白质组学中非常重要。目标诱饵策略（TDS）常用于 FDR 估计，它假设当谱图被错误识别时，谱图与目标或诱饵肽匹配的概率是相同的。然而，实际上没有任何谱图与目标或诱饵肽匹配的概率是相同的。我们提出了 cTDS（带有候选肽的目标诱饵策略），该策略使用谱图被错误识别为目标或诱饵肽的概率来准确估计 FDR。

结果

大多数谱图情况导致谱图被错误识别为目标或诱饵肽的概率接近 0.5，但只有大约 1.14-4.85%的总谱图具有确切的概率 0.5。我们使用捕获序列方法来证明 cTDS 的准确性。对于固定的 FDR 阈值（1-10%），cTDS 的假匹配率（FMR）比 TDS 的 FMR 更接近。我们比较了在 1% FDR 阈值下 TDS 和 cTDS 获得的肽谱匹配（PSM）数量与 HEK293 数据集。在第一和第三次重复中，cTDS 在反向、伪反向、随机化和 de Bruijn 数据库中获得的 PSM 数量超过了 TDS（约 0.001-0.132%），而伪随机化数据库中的 PSM 数量比 TDS 少（约 0.05-0.126%）。在第二次重复中，cTDS 在所有数据库中获得的 PSM 数量都超过了 TDS（约 0.013-0.274%）。

结论

当谱图实际上被错误识别时，大多数谱图与目标或诱饵肽匹配的概率并不相同。因此，我们提出了 cTDS，它使用谱图被错误识别为目标或诱饵肽的概率更准确地估计 FDR。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ecdc/9623924/833c9bdbb197/12859_2022_5002_Fig1_HTML.jpg

相似文献

False discovery rate estimation using candidate peptides for each spectrum.

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

Target-small decoy search strategy for false discovery rate estimation.

BMC Bioinformatics. 2019 Aug 23;20(1):438. doi: 10.1186/s12859-019-3034-8.

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.

J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.

J Proteome Res. 2020 Mar 6;19(3):1029-1036. doi: 10.1021/acs.jproteome.9b00555. Epub 2020 Feb 21.

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics.

J Proteome Res. 2023 Apr 7;22(4):1159-1171. doi: 10.1021/acs.jproteome.2c00604. Epub 2023 Mar 24.

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

Comparison of false-discovery rates of various decoy databases.

Proteome Sci. 2021 Sep 18;19(1):11. doi: 10.1186/s12953-021-00179-7.

Two-dimensional target decoy strategy for shotgun proteomics.

J Proteome Res. 2011 Dec 2;10(12):5296-301. doi: 10.1021/pr200780j. Epub 2011 Nov 7.

引用本文的文献

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.

Nat Methods. 2025 Jun 16. doi: 10.1038/s41592-025-02719-x.

Understanding the mechanism behind preoperative exercise therapy in patients with gastrointestinal cancers: a prospective randomized clinical trial.

BMC Sports Sci Med Rehabil. 2025 Mar 14;17(1):50. doi: 10.1186/s13102-025-01094-6.

Characterization of host cell proteins in the downstream process of plant-Based biologics using LC-MS profiling.

Biotechnol Rep (Amst). 2024 Sep 16;44:e00856. doi: 10.1016/j.btre.2024.e00856. eCollection 2024 Dec.

Assessment of false discovery rate control in tandem mass spectrometry analysis using entrapment.

bioRxiv. 2025 Jan 21:2024.06.01.596967. doi: 10.1101/2024.06.01.596967.

本文引用的文献

Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.

J Proteome Res. 2020 Mar 6;19(3):1029-1036. doi: 10.1021/acs.jproteome.9b00555. Epub 2020 Feb 21.

Target-small decoy search strategy for false discovery rate estimation.

BMC Bioinformatics. 2019 Aug 23;20(1):438. doi: 10.1186/s12859-019-3034-8.

Averaging Strategy To Reduce Variability in Target-Decoy Estimates of False Discovery Rate.

J Proteome Res. 2019 Feb 1;18(2):585-593. doi: 10.1021/acs.jproteome.8b00802. Epub 2019 Jan 3.

Using the entrapment sequence method as a standard to evaluate key steps of proteomics data analysis process.

BMC Genomics. 2017 Mar 14;18(Suppl 2):143. doi: 10.1186/s12864-017-3491-2.

Building ProteomeTools based on a complete synthetic human proteome.

Nat Methods. 2017 Mar;14(3):259-262. doi: 10.1038/nmeth.4153. Epub 2017 Jan 30.

Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.

J Proteome Res. 2017 Feb 3;16(2):393-397. doi: 10.1021/acs.jproteome.6b00144. Epub 2016 Dec 13.

An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.

J Am Soc Mass Spectrom. 1994 Nov;5(11):976-89. doi: 10.1016/1044-0305(94)80016-2.

Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics.

J Proteomics. 2013 Mar 27;80:123-31. doi: 10.1016/j.jprot.2012.12.007. Epub 2012 Dec 23.

False discovery rates in spectral identification.

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S2. doi: 10.1186/1471-2105-13-S16-S2. Epub 2012 Nov 5.

Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins.

Mol Cell Proteomics. 2012 Mar;11(3):M111.014050. doi: 10.1074/mcp.M111.014050. Epub 2012 Jan 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用每个谱图的候选肽进行错误发现率估计。

False discovery rate estimation using candidate peptides for each spectrum.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献