无诱饵的质谱蛋白质组学中假发现率估计的新混合模型。

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.

机构信息

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA.

Illumina Inc., San Diego, CA 92122, USA.

出版信息

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

DOI:10.1093/bioinformatics/btaa807

PMID:33381824

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7773488/

Abstract

MOTIVATION

Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra.

RESULTS

We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms.

AVAILABILITYAND IMPLEMENTATION

https://github.com/shawn-peng/FDR-estimation.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

准确估计基于质谱的蛋白质组学中谱识别的错误发现率（FDR）是一个核心问题。在过去的二十年中，目标诱饵方法（TDA）和无诱饵方法（DFA）已被广泛用于估计 FDR。TDA 使用诱饵物种数据库来忠实地模拟错误肽谱匹配（PSM）的分数分布。另一方面，DFA 拟合双成分混合模型以学习正确和错误 PSM 分数分布的参数。虽然概念上很简单，但这两种方法在实践中都会导致问题，特别是在将仪器推至极限且产生低碎片化效率和低信噪比谱的实验中。

结果

我们引入了一种新的无诱饵 FDR 估计框架，该框架在利用类似于 TDA 的方式在更多搜索数据的同时推广了当前的 DFA。我们的方法依赖于多成分混合物，其中正确 PSM、最佳错误 PSM 和第二佳错误 PSM 的分数分布由偏态正态族建模。我们从与每个实验谱相关的最佳和第二佳 PSM 的分数中推导出 EM 算法来估计这些分布的参数。我们在多个蛋白质组学数据集和一个包含超过一百万谱的 HeLa 细胞消化案例研究上评估了我们的模型。我们提供了改进现有 DFA 性能的证据，并在没有任何性能下降的情况下提高了稳定性和速度，超过了 TDA。我们提出，新策略有可能扩展到肽识别之外，并减少所有分析平台对 TDA 的需求。

可用性和实现

https://github.com/shawn-peng/FDR-estimation。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics.

J Proteome Res. 2023 Apr 7;22(4):1159-1171. doi: 10.1021/acs.jproteome.2c00604. Epub 2023 Mar 24.

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.

J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.

False discovery rate estimation using candidate peptides for each spectrum.

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.

Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.

Quality Control for the Target Decoy Approach for Peptide Identification.

J Proteome Res. 2023 Feb 3;22(2):350-358. doi: 10.1021/acs.jproteome.2c00423. Epub 2023 Jan 17.

Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits.

J Proteome Res. 2020 Jan 3;19(1):161-173. doi: 10.1021/acs.jproteome.9b00478. Epub 2019 Dec 20.

Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.

J Proteome Res. 2017 Feb 3;16(2):393-397. doi: 10.1021/acs.jproteome.6b00144. Epub 2016 Dec 13.

Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.

J Proteome Res. 2020 Mar 6;19(3):1029-1036. doi: 10.1021/acs.jproteome.9b00555. Epub 2020 Feb 21.

引用本文的文献

Gene-based calibration of high-throughput functional assays for clinical variant classification.

bioRxiv. 2025 May 4:2025.04.29.651326. doi: 10.1101/2025.04.29.651326.

Query Mix-Max Method for FDR Estimation Supported by Entrapment Queries.

J Proteome Res. 2025 Mar 7;24(3):1135-1147. doi: 10.1021/acs.jproteome.4c00744. Epub 2025 Feb 5.

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

Improving Peptide-Level Mass Spectrometry Analysis via Double Competition.

J Proteome Res. 2022 Oct 7;21(10):2412-2420. doi: 10.1021/acs.jproteome.2c00282. Epub 2022 Sep 27.

本文引用的文献

Challenges in Peptide-Spectrum Matching: A Robust and Reproducible Statistical Framework for Removing Low-Accuracy, High-Scoring Hits.

J Proteome Res. 2020 Jan 3;19(1):161-173. doi: 10.1021/acs.jproteome.9b00478. Epub 2019 Dec 20.

DO-MS: Data-Driven Optimization of Mass Spectrometry Methods.

J Proteome Res. 2019 Jun 7;18(6):2493-2500. doi: 10.1021/acs.jproteome.9b00039. Epub 2019 May 28.

Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification.

J Proteome Res. 2019 May 3;18(5):2354-2358. doi: 10.1021/acs.jproteome.8b00991. Epub 2019 Apr 18.

Microsampling Capillary Electrophoresis Mass Spectrometry Enables Single-Cell Proteomics in Complex Tissues: Developing Cell Clones in Live Xenopus laevis and Zebrafish Embryos.

Anal Chem. 2019 Apr 2;91(7):4797-4805. doi: 10.1021/acs.analchem.9b00345. Epub 2019 Mar 18.

Integrated Proteome Analysis Device for Fast Single-Cell Protein Profiling.

Anal Chem. 2018 Dec 4;90(23):14003-14010. doi: 10.1021/acs.analchem.8b03692. Epub 2018 Nov 15.

SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation.

Genome Biol. 2018 Oct 22;19(1):161. doi: 10.1186/s13059-018-1547-5.

Target-Decoy-Based False Discovery Rate Estimation for Large-Scale Metabolite Identification.

J Proteome Res. 2018 Jul 6;17(7):2328-2334. doi: 10.1021/acs.jproteome.8b00019. Epub 2018 May 29.

Nanoliter-Scale Oil-Air-Droplet Chip-Based Single Cell Proteomic Analysis.

Anal Chem. 2018 Apr 17;90(8):5430-5438. doi: 10.1021/acs.analchem.8b00661. Epub 2018 Mar 27.

Nanodroplet processing platform for deep and quantitative proteome profiling of 10-100 mammalian cells.

Nat Commun. 2018 Feb 28;9(1):882. doi: 10.1038/s41467-018-03367-w.

Significance estimation for large scale metabolomics annotations by spectral matching.

Nat Commun. 2017 Nov 14;8(1):1494. doi: 10.1038/s41467-017-01318-5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

无诱饵的质谱蛋白质组学中假发现率估计的新混合模型。

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITYAND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献