通过特征基因组划分检测宏基因组数据集中的低丰度细菌菌株。

Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.

作者信息

Cleary Brian, Brito Ilana Lauren, Huang Katherine, Gevers Dirk, Shea Terrance, Young Sarah, Alm Eric J

机构信息

Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

出版信息

Nat Biotechnol. 2015 Oct;33(10):1053-60. doi: 10.1038/nbt.3329. Epub 2015 Sep 14.

DOI:10.1038/nbt.3329

PMID:26368049

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4720164/

Abstract

Analyses of metagenomic datasets that are sequenced to a depth of billions or trillions of bases can uncover hundreds of microbial genomes, but naive assembly of these data is computationally intensive, requiring hundreds of gigabytes to terabytes of RAM. We present latent strain analysis (LSA), a scalable, de novo pre-assembly method that separates reads into biologically informed partitions and thereby enables assembly of individual genomes. LSA is implemented with a streaming calculation of unobserved variables that we call eigengenomes. Eigengenomes reflect covariance in the abundance of short, fixed-length sequences, or k-mers. As the abundance of each genome in a sample is reflected in the abundance of each k-mer in that genome, eigengenome analysis can be used to partition reads from different genomes. This partitioning can be done in fixed memory using tens of gigabytes of RAM, which makes assembly and downstream analyses of terabytes of data feasible on commodity hardware. Using LSA, we assemble partial and near-complete genomes of bacterial taxa present at relative abundances as low as 0.00001%. We also show that LSA is sensitive enough to separate reads from several strains of the same species.

摘要

对测序深度达数十亿或数万亿碱基的宏基因组数据集进行分析，可以发现数百个微生物基因组，但对这些数据进行简单组装计算量很大，需要数百吉字节到数太字节的随机存取存储器。我们提出了潜在菌株分析（LSA），这是一种可扩展的从头预组装方法，可将 reads 分离到具有生物学意义的分区中，从而实现单个基因组的组装。LSA 通过对我们称为特征基因组的未观察变量进行流式计算来实现。特征基因组反映了短的固定长度序列（即 k-mer）丰度的协方差。由于样本中每个基因组的丰度反映在该基因组中每个 k-mer 的丰度中，因此特征基因组分析可用于将来自不同基因组的 reads 进行分区。这种分区可以使用数十吉字节的随机存取存储器在固定内存中完成，这使得在商用硬件上对数太字节的数据进行组装和下游分析成为可能。使用 LSA，我们组装了相对丰度低至 0.00001% 的细菌类群的部分和近乎完整的基因组。我们还表明，LSA 足够灵敏，能够分离来自同一物种多个菌株的 reads。

相似文献

Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.

Nat Biotechnol. 2015 Oct;33(10):1053-60. doi: 10.1038/nbt.3329. Epub 2015 Sep 14.

ConStrains identifies microbial strains in metagenomic datasets.

Nat Biotechnol. 2015 Oct;33(10):1045-52. doi: 10.1038/nbt.3319. Epub 2015 Sep 7.

Strain recovery from metagenomes.

Nat Biotechnol. 2015 Oct;33(10):1041-3. doi: 10.1038/nbt.3375.

Terabase-scale metagenome coassembly with MetaHipMer.

Sci Rep. 2020 Jul 1;10(1):10689. doi: 10.1038/s41598-020-67416-5.

Quantifying and comparing bacterial growth dynamics in multiple metagenomic samples.

Nat Methods. 2018 Dec;15(12):1041-1044. doi: 10.1038/s41592-018-0182-0. Epub 2018 Nov 12.

Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution.

PLoS Comput Biol. 2013;9(10):e1003292. doi: 10.1371/journal.pcbi.1003292. Epub 2013 Oct 17.

Metagenomic binning and association of plasmids with bacterial host genomes using DNA methylation.

Nat Biotechnol. 2018 Jan;36(1):61-69. doi: 10.1038/nbt.4037. Epub 2017 Dec 11.

A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes.

BMC Bioinformatics. 2016 Aug 19;17(1):311. doi: 10.1186/s12859-016-1186-3.

Binning unassembled short reads based on k-mer abundance covariance using sparse coding.

Gigascience. 2020 Apr 1;9(4). doi: 10.1093/gigascience/giaa028.

Estimating the total genome length of a metagenomic sample using k-mers.

BMC Genomics. 2019 Apr 4;20(Suppl 2):183. doi: 10.1186/s12864-019-5467-x.

引用本文的文献

Microbial and clinical disparities in pneumonia: insights from metagenomic next-generation sequencing in patients with community-acquired and severe pneumonia.

Front Microbiol. 2025 Jun 20;16:1538109. doi: 10.3389/fmicb.2025.1538109. eCollection 2025.

Recovery of 679 metagenome-assembled genomes from different soil depths along a precipitation gradient.

Sci Data. 2025 Mar 28;12(1):521. doi: 10.1038/s41597-025-04884-2.

Evaluating the potential of assembler-binner combinations in recovering low-abundance and strain-resolved genomes from human metagenomes.

Heliyon. 2025 Jan 14;11(2):e41938. doi: 10.1016/j.heliyon.2025.e41938. eCollection 2025 Jan 30.

GutMetaNet: an integrated database for exploring horizontal gene transfer and functional redundancy in the human gut microbiome.

Nucleic Acids Res. 2025 Jan 6;53(D1):D772-D782. doi: 10.1093/nar/gkae1007.

Metagenomic analysis of rats with diarrhea treated with mixed probiotics: response to consecutive and alternate-hour supplementation.

Transl Pediatr. 2024 Aug 31;13(8):1336-1358. doi: 10.21037/tp-24-129. Epub 2024 Aug 28.

Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens.

Nat Commun. 2024 Jul 26;15(1):6306. doi: 10.1038/s41467-024-49957-9.

MetaTrass: A high-quality metagenome assembler of the human gut microbiome by cobarcoding sequencing reads.

Imeta. 2022 Aug 15;1(4):e46. doi: 10.1002/imt2.46. eCollection 2022 Dec.

mEnrich-seq: methylation-guided enrichment sequencing of bacterial taxa of interest from microbiome.

Nat Methods. 2024 Feb;21(2):236-246. doi: 10.1038/s41592-023-02125-1. Epub 2024 Jan 4.

Analyzing rare mutations in metagenomes assembled using long and accurate reads.

Genome Res. 2022 Nov-Dec;32(11-12):2119-2133. doi: 10.1101/gr.276917.122. Epub 2022 Nov 23.

Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts.

Front Bioinform. 2022 May 16;2:867386. doi: 10.3389/fbinf.2022.867386. eCollection 2022.

本文引用的文献

MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph.

Bioinformatics. 2015 May 15;31(10):1674-6. doi: 10.1093/bioinformatics/btv033. Epub 2015 Jan 20.

GroopM: an automated tool for the recovery of population genomes from related metagenomes.

PeerJ. 2014 Sep 30;2:e603. doi: 10.7717/peerj.603. eCollection 2014.

Binning metagenomic contigs by coverage and composition.

Nat Methods. 2014 Nov;11(11):1144-6. doi: 10.1038/nmeth.3103. Epub 2014 Sep 14.

Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes.

Nat Biotechnol. 2014 Aug;32(8):822-8. doi: 10.1038/nbt.2939. Epub 2014 Jul 6.

Tackling soil diversity with the assembly of large, complex metagenomes.

Proc Natl Acad Sci U S A. 2014 Apr 1;111(13):4904-9. doi: 10.1073/pnas.1402564111. Epub 2014 Mar 14.

Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes.

Nat Biotechnol. 2013 Jun;31(6):533-8. doi: 10.1038/nbt.2579. Epub 2013 May 26.

A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets.

PLoS Comput Biol. 2013;9(1):e1002863. doi: 10.1371/journal.pcbi.1002863. Epub 2013 Jan 10.

MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.

Genome Biol. 2013 Jan 15;14(1):R2. doi: 10.1186/gb-2013-14-1-r2.

Ray Meta: scalable de novo metagenome assembly and profiling.

Genome Biol. 2012 Dec 22;13(12):R122. doi: 10.1186/gb-2012-13-12-r122.

Global biogeography of highly diverse protistan communities in soil.

ISME J. 2013 Mar;7(3):652-9. doi: 10.1038/ismej.2012.147. Epub 2012 Dec 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过特征基因组划分检测宏基因组数据集中的低丰度细菌菌株。

Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.

作者信息

Cleary Brian, Brito Ilana Lauren, Huang Katherine, Gevers Dirk, Shea Terrance, Young Sarah, Alm Eric J

机构信息

Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA.

出版信息

Nat Biotechnol. 2015 Oct;33(10):1053-60. doi: 10.1038/nbt.3329. Epub 2015 Sep 14.

DOI:10.1038/nbt.3329

PMID:26368049

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4720164/

Abstract

摘要

通过特征基因组划分检测宏基因组数据集中的低丰度细菌菌株。

Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

通过特征基因组划分检测宏基因组数据集中的低丰度细菌菌株。

Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning.

作者信息

机构信息

出版信息