XL-MS/MS 蛋白质组学中无诱饵的假发现率估计算法。

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

机构信息

Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, United States.

The Institute for Experiential AI, Northeastern University, Boston, MA 02115, United States.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

DOI:10.1093/bioinformatics/btae233

PMID:38940171

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11256928/

Abstract

MOTIVATION

Cross-linking tandem mass spectrometry (XL-MS/MS) is an established analytical platform used to determine distance constraints between residues within a protein or from physically interacting proteins, thus improving our understanding of protein structure and function. To aid biological discovery with XL-MS/MS, it is essential that pairs of chemically linked peptides be accurately identified, a process that requires: (i) database search, that creates a ranked list of candidate peptide pairs for each experimental spectrum and (ii) false discovery rate (FDR) estimation, that determines the probability of a false match in a group of top-ranked peptide pairs with scores above a given threshold. Currently, the only available FDR estimation mechanism in XL-MS/MS is the target-decoy approach (TDA). However, despite its simplicity, TDA has both theoretical and practical limitations that impact the estimation accuracy and increase run time over potential decoy-free approaches (DFAs).

RESULTS

We introduce a novel decoy-free framework for FDR estimation in XL-MS/MS. Our approach relies on multi-sample mixtures of skew normal distributions, where the latent components correspond to the scores of correct peptide pairs (both peptides identified correctly), partially incorrect peptide pairs (one peptide identified correctly, the other incorrectly), and incorrect peptide pairs (both peptides identified incorrectly). To learn these components, we exploit the score distributions of first- and second-ranked peptide-spectrum matches for each experimental spectrum and subsequently estimate FDR using a novel expectation-maximization algorithm with constraints. We evaluate the method on ten datasets and provide evidence that the proposed DFA is theoretically sound and a viable alternative to TDA owing to its good performance in terms of accuracy, variance of estimation, and run time.

AVAILABILITY AND IMPLEMENTATION

https://github.com/shawn-peng/xlms.

摘要

动机

交联串联质谱（XL-MS/MS）是一种成熟的分析平台，用于确定蛋白质内残基之间或物理相互作用的蛋白质之间的距离约束，从而提高我们对蛋白质结构和功能的理解。为了通过 XL-MS/MS 促进生物学发现，准确识别化学交联的肽对至关重要，这一过程需要：（i）数据库搜索，为每个实验谱创建候选肽对的排名列表，以及（ii）错误发现率（FDR）估计，确定在一组得分高于给定阈值的排名靠前的肽对中出现错误匹配的概率。目前，XL-MS/MS 中唯一可用的 FDR 估计机制是靶标-诱饵方法（TDA）。然而，尽管 TDA 简单，但它存在理论和实际的局限性，这会影响估计的准确性，并增加潜在无诱饵方法（DFA）的运行时间。

结果

我们提出了一种用于 XL-MS/MS 中 FDR 估计的新颖的无诱饵框架。我们的方法依赖于偏态正态分布的多样本混合物，其中潜在成分对应于正确肽对（两个肽都正确鉴定）、部分不正确肽对（一个肽正确鉴定，另一个不正确鉴定）和不正确肽对（两个肽都不正确鉴定）的分数。为了学习这些成分，我们利用每个实验谱的第一和第二排名肽谱匹配的分数分布，随后使用具有约束的新期望最大化算法估计 FDR。我们在十个数据集上评估了该方法，并提供了证据表明，由于其在准确性、估计方差和运行时间方面的良好表现，所提出的 DFA 在理论上是合理的，并且是 TDA 的可行替代方案。

可用性和实现

https://github.com/shawn-peng/xlms。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d2cf/11256928/ab08c67fd78d/btae233f1.jpg

相似文献

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.XL-MS/MS 蛋白质组学中无诱饵的假发现率估计算法。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.无诱饵的质谱蛋白质组学中假发现率估计的新混合模型。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

False discovery rate estimation using candidate peptides for each spectrum.使用每个谱图的候选肽进行错误发现率估计。

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

Target-decoy approach and false discovery rate: when things may go wrong.靶向诱饵方法和错误发现率：当事情可能出错时。

J Am Soc Mass Spectrom. 2011 Jul;22(7):1111-20. doi: 10.1007/s13361-011-0139-3. Epub 2011 May 5.

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics.对低阶统计量进行建模以实现蛋白质组学中无诱饵的错误发现率估计。

J Proteome Res. 2023 Apr 7;22(4):1159-1171. doi: 10.1021/acs.jproteome.2c00604. Epub 2023 Mar 24.

Quality Control for the Target Decoy Approach for Peptide Identification.目标诱饵方法用于肽鉴定的质量控制。

J Proteome Res. 2023 Feb 3;22(2):350-358. doi: 10.1021/acs.jproteome.2c00423. Epub 2023 Jan 17.

False discovery rates in spectral identification.光谱识别中的假发现率。

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S2. doi: 10.1186/1471-2105-13-S16-S2. Epub 2012 Nov 5.

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.通用诱饵分布简化了鸟枪法蛋白质组学中的错误发现率估计

J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.

Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach.基于目标-诱饵法的鸟枪法蛋白质组学无偏错误发现率估计

J Proteome Res. 2017 Feb 3;16(2):393-397. doi: 10.1021/acs.jproteome.6b00144. Epub 2016 Dec 13.

Challenging Targets or Describing Mismatches? A Comment on Common Decoy Distribution by Madej et al.有挑战性的目标还是描述不匹配？对 Madej 等人常见诱饵分布的评论

J Proteome Res. 2022 Dec 2;21(12):2840-2845. doi: 10.1021/acs.jproteome.2c00279. Epub 2022 Oct 28.

引用本文的文献

Gene-based calibration of high-throughput functional assays for clinical variant classification.用于临床变异分类的高通量功能测定的基于基因的校准

bioRxiv. 2025 May 4:2025.04.29.651326. doi: 10.1101/2025.04.29.651326.

本文引用的文献

Cross-Linking Mass Spectrometry for Investigating Protein Conformations and Protein-Protein Interactions─A Method for All Seasons.交联质谱法用于研究蛋白质构象和蛋白质-蛋白质相互作用——一种适用于所有季节的方法。

Chem Rev. 2022 Apr 27;122(8):7500-7531. doi: 10.1021/acs.chemrev.1c00786. Epub 2021 Nov 19.

The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.PRIDE 数据库资源在 2022 年：一个基于质谱的蛋白质组学证据的中心。

Nucleic Acids Res. 2022 Jan 7;50(D1):D543-D552. doi: 10.1093/nar/gkab1038.

New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics.无诱饵的质谱蛋白质组学中假发现率估计的新混合模型。

Bioinformatics. 2020 Dec 30;36(Suppl_2):i745-i753. doi: 10.1093/bioinformatics/btaa807.

OpenPepXL: An Open-Source Tool for Sensitive Identification of Cross-Linked Peptides in XL-MS.OpenPepXL：一种用于 XL-MS 中交联肽敏感鉴定的开源工具。

Mol Cell Proteomics. 2020 Dec;19(12):2157-2168. doi: 10.1074/mcp.TIR120.002186. Epub 2020 Oct 16.

Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification.基于质谱的肽鉴定中错误发现率估计的偏差。

J Proteome Res. 2019 May 3;18(5):2354-2358. doi: 10.1021/acs.jproteome.8b00991. Epub 2019 Apr 18.

SCoPE-MS: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation.SCoPE-MS：单细胞哺乳动物细胞的质谱分析定量了细胞分化过程中的蛋白质组异质性。

Genome Biol. 2018 Oct 22;19(1):161. doi: 10.1186/s13059-018-1547-5.

Cross-Linking Mass Spectrometry: An Emerging Technology for Interactomics and Structural Biology.交联质谱：一种用于相互作用组学和结构生物学的新兴技术。

Anal Chem. 2018 Jan 2;90(1):144-165. doi: 10.1021/acs.analchem.7b04431. Epub 2017 Nov 21.

Gentle Introduction to the Statistical Foundations of False Discovery Rate in Quantitative Proteomics.定量蛋白质组学中错误发现率统计基础的简要介绍。

J Proteome Res. 2018 Jan 5;17(1):12-22. doi: 10.1021/acs.jproteome.7b00170. Epub 2017 Nov 14.

Challenges and perspectives of metaproteomic data analysis.代谢组学数据分析的挑战与展望。

J Biotechnol. 2017 Nov 10;261:24-36. doi: 10.1016/j.jbiotec.2017.06.1201. Epub 2017 Jun 27.

MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics.MSFragger：基于质谱的蛋白质组学中实现超快速且全面的肽段鉴定

Nat Methods. 2017 May;14(5):513-520. doi: 10.1038/nmeth.4256. Epub 2017 Apr 10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

XL-MS/MS 蛋白质组学中无诱饵的假发现率估计算法。

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实现

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献