利用快速增长的基因组集合进行微生物群落基因分型的陷阱。

Pitfalls of genotyping microbial communities with rapidly growing genome collections.

机构信息

Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.

Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.

出版信息

Cell Syst. 2023 Feb 15;14(2):160-176.e3. doi: 10.1016/j.cels.2022.12.007. Epub 2023 Jan 18.

DOI:10.1016/j.cels.2022.12.007

PMID:36657438

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9957970/

Abstract

Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.

摘要

在宏基因组数据中检测遗传变异是理解微生物群落的进化、生态和功能特征的首要任务。许多执行这种宏基因分型的工具都依赖于将未知来源的读取与来自多种物种的序列数据库进行比对，然后再调用变体。在这项综合研究中，我们研究了越来越多样化和密切相关的物种数据库如何推动当前比对算法的极限，从而降低宏基因分型工具的性能。我们将多映射读取识别为错误的常见来源，并说明了保留正确比对与限制错误比对之间的权衡，其中许多比对将读取映射到错误的物种。然后，我们评估了几种可行的缓解策略，并回顾了新兴方法，这些方法有望进一步改进宏基因分型，以应对基因组集合的快速增长。我们的研究结果不仅对宏基因分型有影响，而且对微生物基因组学中许多依赖于准确读取映射的工具也有影响。

相似文献

Pitfalls of genotyping microbial communities with rapidly growing genome collections.

Cell Syst. 2023 Feb 15;14(2):160-176.e3. doi: 10.1016/j.cels.2022.12.007. Epub 2023 Jan 18.

Genotyping Microbial Communities with MIDAS2: From Metagenomic Reads to Allele Tables.

Curr Protoc. 2022 Dec;2(12):e604. doi: 10.1002/cpz1.604.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Fast and accurate metagenotyping of the human gut microbiome with GT-Pro.

Nat Biotechnol. 2022 Apr;40(4):507-516. doi: 10.1038/s41587-021-01102-3. Epub 2021 Dec 23.

Calling known variants and identifying new variants while rapidly aligning sequence data.

J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.

Evaluating Assembly and Binning Strategies for Time Series Drinking Water Metagenomes.

Microbiol Spectr. 2021 Dec 22;9(3):e0143421. doi: 10.1128/Spectrum.01434-21. Epub 2021 Nov 3.

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

Brief Bioinform. 2019 Jul 19;20(4):1140-1150. doi: 10.1093/bib/bbx098.

Fast and SNP-aware short read alignment with SALT.

BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

In silico benchmarking of metagenomic tools for coding sequence detection reveals the limits of sensitivity and precision.

BMC Bioinformatics. 2020 Oct 15;21(1):459. doi: 10.1186/s12859-020-03802-0.

Exploiting topic modeling to boost metagenomic reads binning.

BMC Bioinformatics. 2015;16 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-16-S5-S2. Epub 2015 Mar 18.

引用本文的文献

Linkage of nucleotide and functional diversity varies across gut bacteria.

bioRxiv. 2025 Jun 7:2025.06.06.658399. doi: 10.1101/2025.06.06.658399.

Improved detection of microbiome-disease associations via population structure-aware generalized linear mixed effects models (microSLAM).

PLoS Comput Biol. 2025 May 27;21(5):e1012277. doi: 10.1371/journal.pcbi.1012277. eCollection 2025 May.

Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

mSphere. 2025 May 27;10(5):e0085724. doi: 10.1128/msphere.00857-24. Epub 2025 Apr 29.

Accurate estimation of intraspecific microbial gene content variation in metagenomic data with MIDAS v3 and StrainPGC.

Genome Res. 2025 May 2;35(5):1247-1260. doi: 10.1101/gr.279543.124.

Comprehensive profiling of genomic invertons in defined gut microbial community reveals associations with intestinal colonization and surface adhesion.

Microbiome. 2025 Mar 10;13(1):71. doi: 10.1186/s40168-025-02052-7.

Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

bioRxiv. 2024 Oct 13:2024.10.11.617902. doi: 10.1101/2024.10.11.617902.

New Drinking Water Genome Catalog Identifies a Globally Distributed Bacterial Genus Adapted to Disinfected Drinking Water Systems.

Environ Sci Technol. 2024 Sep 17;58(37):16475-16487. doi: 10.1021/acs.est.4c05086. Epub 2024 Sep 5.

PUPpy: a primer design pipeline for substrain-level microbial detection and absolute quantification.

mSphere. 2024 Jul 30;9(7):e0036024. doi: 10.1128/msphere.00360-24. Epub 2024 Jul 9.

Multi-omic analysis tools for microbial metabolites prediction.

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae264.

INSaFLU-TELEVIR: an open web-based bioinformatics suite for viral metagenomic detection and routine genomic surveillance.

Genome Med. 2024 Apr 25;16(1):61. doi: 10.1186/s13073-024-01334-3.

本文引用的文献

Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4.

Nat Biotechnol. 2023 Nov;41(11):1633-1644. doi: 10.1038/s41587-023-01688-w. Epub 2023 Feb 23.

Metagenome assembled genomes are for eukaryotes too.

Cell Genom. 2022 May 11;2(5):100130. doi: 10.1016/j.xgen.2022.100130.

MIDAS2: Metagenomic Intra-species Diversity Analysis System.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac713.

Variability of strain engraftment and predictability of microbiome composition after fecal microbiota transplantation across different diseases.

Nat Med. 2022 Sep;28(9):1913-1923. doi: 10.1038/s41591-022-01964-3. Epub 2022 Sep 15.

Short- and long-read metagenomics expand individualized structural variations in gut microbiomes.

Nat Commun. 2022 Jun 8;13(1):3175. doi: 10.1038/s41467-022-30857-9.

Exodus: sequencing-based pipeline for quantification of pooled variants.

Bioinformatics. 2022 Jun 13;38(12):3288-3290. doi: 10.1093/bioinformatics/btac319.

Strain Identification and Quantitative Analysis in Microbial Communities.

J Mol Biol. 2022 Aug 15;434(15):167582. doi: 10.1016/j.jmb.2022.167582. Epub 2022 Apr 7.

Short- and long-read metagenomics of urban and rural South African gut microbiomes reveal a transitional composition and undescribed taxa.

Nat Commun. 2022 Feb 22;13(1):926. doi: 10.1038/s41467-021-27917-x.

Fast and accurate metagenotyping of the human gut microbiome with GT-Pro.

Nat Biotechnol. 2022 Apr;40(4):507-516. doi: 10.1038/s41587-021-01102-3. Epub 2021 Dec 23.

metaSNV v2: detection of SNVs and subspecies in prokaryotic metagenomes.

Bioinformatics. 2022 Jan 27;38(4):1162-1164. doi: 10.1093/bioinformatics/btab789.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用快速增长的基因组集合进行微生物群落基因分型的陷阱。

Pitfalls of genotyping microbial communities with rapidly growing genome collections.

机构信息

Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.

出版信息

Cell Syst. 2023 Feb 15;14(2):160-176.e3. doi: 10.1016/j.cels.2022.12.007. Epub 2023 Jan 18.

DOI:10.1016/j.cels.2022.12.007

PMID:36657438

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9957970/

Abstract

摘要

利用快速增长的基因组集合进行微生物群落基因分型的陷阱。

Pitfalls of genotyping microbial communities with rapidly growing genome collections.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用快速增长的基因组集合进行微生物群落基因分型的陷阱。

Pitfalls of genotyping microbial communities with rapidly growing genome collections.

机构信息

出版信息