FineFDR：宏蛋白质组学中细粒度分类学特异性错误发现率控制

FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics.

作者信息

Wang Shengze, Feng Shichao, Pan Chongle, Guo Xuan

机构信息

Department of Computer Science and Engineering University of North Texas, Denton, TX 76207, United States.

School of Computer Science Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK 73019, United States.

出版信息

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:287-292. doi: 10.1109/bibm55620.2022.9995401. Epub 2023 Jan 2.

DOI:10.1109/bibm55620.2022.9995401

PMID:36910011

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9998077/

Abstract

Microbial community proteomics, also termed metaproteomics, investigates all proteins expressed by a microbiota. Tandem mass spectrometry (MS/MS) is the typical method for identifying proteins in metaproteomics, which involves searching the mass spectra against a protein sequence database. A major post-analysis step is controlling the false discovery rate (FDR), i.e., the ratio of false positives to the total number of annotations. The current popular target-decoy FDR estimation method treats all the peptides and proteins equally and overlooks that they could have varied probabilities of being identified. In this study, we report FineFDR, a framework for FDR assessment at fine-grained levels with taxonomy information considered. FineFDR groups the identified peptide-spectrum matches, peptides, and proteins from different taxonomic units and estimates the FDR in each group separately. Empirical experiments on the simulated and real-world data sets demonstrate that our FineFDR achieved higher precision and more peptide and protein identifications when compared to the state-of-the-art methods, such as Comet, Percolator, TIDD, and Tailor. FineFDR is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/FDR.

摘要

微生物群落蛋白质组学，也称为宏蛋白质组学，研究微生物群表达的所有蛋白质。串联质谱（MS/MS）是宏蛋白质组学中鉴定蛋白质的典型方法，该方法涉及针对蛋白质序列数据库搜索质谱图。一个主要的分析后步骤是控制错误发现率（FDR），即假阳性与注释总数的比率。当前流行的目标诱饵FDR估计方法平等对待所有肽段和蛋白质，而忽略了它们被鉴定的概率可能不同。在本研究中，我们报告了FineFDR，这是一个在考虑分类信息的情况下进行细粒度水平FDR评估的框架。FineFDR对来自不同分类单元的已鉴定肽段-谱匹配、肽段和蛋白质进行分组，并分别估计每组中的FDR。在模拟和真实数据集上进行的实证实验表明，与Comet、Percolator、TIDD和Tailor等现有方法相比，我们的FineFDR具有更高的精度，并且鉴定出了更多的肽段和蛋白质。FineFDR可在GNU GPL许可下免费获取，网址为https://github.com/Biocomputing-Research-Group/FDR。

相似文献

FineFDR: Fine-grained Taxonomy-specific False Discovery Rates Control in Metaproteomics.FineFDR：宏蛋白质组学中细粒度分类学特异性错误发现率控制

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:287-292. doi: 10.1109/bibm55620.2022.9995401. Epub 2023 Jan 2.

Deep learning for peptide identification from metaproteomics datasets.基于深度学习的宏蛋白质组学数据肽段鉴定。

J Proteomics. 2021 Sep 15;247:104316. doi: 10.1016/j.jprot.2021.104316. Epub 2021 Jul 8.

Sipros Ensemble improves database searching and filtering for complex metaproteomics.Sipros Ensemble 可改善复杂宏蛋白质组学的数据库搜索和筛选。

Bioinformatics. 2018 Mar 1;34(5):795-802. doi: 10.1093/bioinformatics/btx601.

Reverse and Random Decoy Methods for False Discovery Rate Estimation in High Mass Accuracy Peptide Spectral Library Searches.反转和随机诱饵方法在高质量精度肽谱库搜索中的假发现率估计。

J Proteome Res. 2018 Feb 2;17(2):846-857. doi: 10.1021/acs.jproteome.7b00614. Epub 2018 Jan 11.

False discovery rates in spectral identification.光谱识别中的假发现率。

BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S2. doi: 10.1186/1471-2105-13-S16-S2. Epub 2012 Nov 5.

False discovery rate estimation using candidate peptides for each spectrum.使用每个谱图的候选肽进行错误发现率估计。

BMC Bioinformatics. 2022 Nov 1;23(1):454. doi: 10.1186/s12859-022-05002-4.

Common Decoy Distributions Simplify False Discovery Rate Estimation in Shotgun Proteomics.通用诱饵分布简化了鸟枪法蛋白质组学中的错误发现率估计

J Proteome Res. 2022 Feb 4;21(2):339-348. doi: 10.1021/acs.jproteome.1c00600. Epub 2022 Jan 6.

Decoy methods for assessing false positives and false discovery rates in shotgun proteomics.用于评估鸟枪法蛋白质组学中假阳性和错误发现率的诱饵方法。

Anal Chem. 2009 Jan 1;81(1):146-59. doi: 10.1021/ac801664q.

Transferred subgroup false discovery rate for rare post-translational modifications detected by mass spectrometry.通过质谱检测到的罕见翻译后修饰的转移亚组错误发现率。

Mol Cell Proteomics. 2014 May;13(5):1359-68. doi: 10.1074/mcp.O113.030189. Epub 2013 Nov 7.

An algorithm for decoy-free false discovery rate estimation in XL-MS/MS proteomics.XL-MS/MS 蛋白质组学中无诱饵的假发现率估计算法。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i428-i436. doi: 10.1093/bioinformatics/btae233.

引用本文的文献

Nanopore sequencing enables novel detection of deuterium incorporation in DNA.纳米孔测序能够实现对DNA中氘掺入的全新检测。

Comput Struct Biotechnol J. 2024 Oct 3;23:3584-3594. doi: 10.1016/j.csbj.2024.09.027. eCollection 2024 Dec.

SEMQuant: Extending Sipros-Ensemble with Match-Between-Runs for Comprehensive Quantitative Metaproteomics.SEMQuant：通过运行间匹配扩展Sipros集成方法用于全面定量宏蛋白质组学

Bioinform Res Appl. 2024 Jul;14956:102-115. doi: 10.1007/978-981-97-5087-0_9. Epub 2024 Jul 12.

本文引用的文献

False discovery rate: the Achilles' heel of proteogenomics.错误发现率：蛋白质基因组学的致命弱点。

Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac163.

TIDD: tool-independent and data-dependent machine learning for peptide identification.TIDD：用于肽鉴定的与工具无关且与数据相关的机器学习。

BMC Bioinformatics. 2022 Mar 30;23(1):109. doi: 10.1186/s12859-022-04640-y.

UniProt: the universal protein knowledgebase in 2021.UniProt：2021 年的通用蛋白质知识库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。

Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892.

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.裁缝：一种基于数据库搜索的 shotgun 蛋白质组学肽鉴定的非参数和快速评分校准方法。

J Proteome Res. 2020 Apr 3;19(4):1481-1490. doi: 10.1021/acs.jproteome.9b00736. Epub 2020 Mar 25.

Interspecies Competition Impacts Targeted Manipulation of Human Gut Bacteria by Fiber-Derived Glycans.种间竞争影响膳食纤维衍生聚糖靶向人体肠道细菌的操纵。

Cell. 2019 Sep 19;179(1):59-73.e13. doi: 10.1016/j.cell.2019.08.011.

MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.MetaBAT 2：一种用于从宏基因组组装中进行稳健且高效的基因组重建的自适应分箱算法。

PeerJ. 2019 Jul 26;7:e7359. doi: 10.7717/peerj.7359. eCollection 2019.

Lobular architecture of human adipose tissue defines the niche and fate of progenitor cells.人类脂肪组织的小叶结构定义了祖细胞的生态位和命运。

Nat Commun. 2019 Jun 11;10(1):2549. doi: 10.1038/s41467-019-09992-3.

MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis.MetaWRAP-一个用于基因组解析宏基因组数据分析的灵活管道。

Microbiome. 2018 Sep 15;6(1):158. doi: 10.1186/s40168-018-0541-1.

Metaproteomics method to determine carbon sources and assimilation pathways of species in microbial communities.宏蛋白质组学方法用于确定微生物群落中物种的碳源和同化途径。

Proc Natl Acad Sci U S A. 2018 Jun 12;115(24):E5576-E5584. doi: 10.1073/pnas.1722325115. Epub 2018 May 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验