一种用于量化微生物群落中密切相关基因组的德布鲁因图方法。

A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community.

作者信息

Wang Mingjie, Ye Yuzhen, Tang Haixu

机构信息

School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA.

出版信息

J Comput Biol. 2012 Jun;19(6):814-25. doi: 10.1089/cmb.2012.0058.

DOI:10.1089/cmb.2012.0058

PMID:22697249

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3375647/

Abstract

The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e.g., within the same genus) or strains (e.g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).

摘要

下一代测序（NGS）技术在宏基因组学中的广泛应用带来了许多计算挑战。宏基因组学中的一个基本问题是估计微生物群落的分类组成，这可以通过将从群落中获得的鸟枪法 reads 映射到先前已表征的微生物基因组，然后根据映射 reads 的数量对这些物种进行定量分析来实现。然而，这个过程并不像乍看起来那么简单。鸟枪法宏基因组数据集通常包含来自许多密切相关的微生物物种（例如，同一属内）或菌株（例如，同一物种内）的 DNA 序列，因此当一个特定的 read 可以以高相似性映射到多个基因组共享的共同区域时，通常很难确定它是从哪个物种/菌株中采样的。此外，在同一物种内的个体基因组之间观察到高度的基因组变异，在 reads 映射过程中很难将其与种间变异区分开来。为了解决这些问题，一种常用的方法是仅基于映射到同一属的所有物种的 reads 来在属水平上定量分类分布；或者，将 reads 映射到一组代表性基因组，每个基因组被选来代表一个不同的属。在这里，我们介绍一种新的方法，通过将 reads 映射到由 de Bruijn 图表示的它们的基因组来估计同一属内密切相关物种的数量，其中它们之间的共同基因组区域被压缩。使用模拟和真实的宏基因组数据集，我们表明 de Bruijn 图方法相对于现有方法有几个优点，包括（1）它避免了鸟枪法 reads 对不同基因组中共同区域的多个副本的冗余映射，以及（2）它对密切相关物种（甚至同一物种内的菌株）导致更准确的定量。

相似文献

A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community.一种用于量化微生物群落中密切相关基因组的德布鲁因图方法。

J Comput Biol. 2012 Jun;19(6):814-25. doi: 10.1089/cmb.2012.0058.

Detection of structural variants involving repetitive regions in the reference genome.检测参考基因组中涉及重复区域的结构变异。

J Comput Biol. 2014 Mar;21(3):219-33. doi: 10.1089/cmb.2013.0129. Epub 2014 Feb 19.

Meta-IDBA: a de Novo assembler for metagenomic data.Meta-IDBA：一种用于宏基因组数据的从头组装程序。

Bioinformatics. 2011 Jul 1;27(13):i94-101. doi: 10.1093/bioinformatics/btr216.

Evaluation of short read metagenomic assembly.短读宏基因组组装评估。

BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.

Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。

BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

ReprDB and panDB: minimalist databases with maximal microbial representation.ReprDB 和 panDB：具有最大微生物代表性的极简主义数据库。

Microbiome. 2018 Jan 18;6(1):15. doi: 10.1186/s40168-018-0399-2.

Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia.通过平行群落的宏基因组测序鉴定和解决微多样性

Appl Environ Microbiol. 2015 Oct 23;82(1):255-67. doi: 10.1128/AEM.02274-15. Print 2016 Jan 1.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序：一种合成方法。

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Mora: abundance aware metagenomic read re-assignment for disentangling similar strains.莫拉：用于区分相似菌株的丰度感知宏基因组读数重新分配法

BMC Bioinformatics. 2024 Apr 23;25(1):161. doi: 10.1186/s12859-024-05768-9.

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads.MetaVelvet：Velvet 组装器的扩展，用于从短序列读取进行从头宏基因组组装。

Nucleic Acids Res. 2012 Nov 1;40(20):e155. doi: 10.1093/nar/gks678. Epub 2012 Jul 19.

引用本文的文献

K2R: Tinted de Bruijn graphs implementation for efficient read extraction from sequencing datasets.K2R：用于从测序数据集中高效提取 reads 的带颜色的德布鲁因图实现。

Bioinform Adv. 2025 May 14;5(1):vbaf111. doi: 10.1093/bioadv/vbaf111. eCollection 2025.

Applications of de Bruijn graphs in microbiome research.德布鲁因图在微生物组研究中的应用。

Imeta. 2022 Mar 1;1(1):e4. doi: 10.1002/imt2.4. eCollection 2022 Mar.

Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2.使用 Cuttlefish 2 实现可扩展、超快速和低内存消耗的紧凑 de Bruijn 图构建。

Genome Biol. 2022 Sep 8;23(1):190. doi: 10.1186/s13059-022-02743-6.

Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。

BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

Strand-specific community RNA-seq reveals prevalent and dynamic antisense transcription in human gut microbiota.链特异性群落RNA测序揭示了人类肠道微生物群中普遍存在且动态变化的反义转录现象。

Front Microbiol. 2015 Sep 1;6:896. doi: 10.3389/fmicb.2015.00896. eCollection 2015.

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis.利用宏基因组组装的德布鲁因图进行宏转录组分析。

Bioinformatics. 2016 Apr 1;32(7):1001-8. doi: 10.1093/bioinformatics/btv510. Epub 2015 Aug 29.

Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.通过微生物法医分析，群体测序作为鼻疽伯克霍尔德菌和类鼻疽伯克霍尔德菌进化的生物标志物

J Nucleic Acids. 2013;2013:801505. doi: 10.1155/2013/801505. Epub 2013 Dec 17.

Clinical and ethical considerations of massively parallel sequencing in transplantation science.移植科学中大规模平行测序的临床与伦理考量

World J Transplant. 2013 Dec 24;3(4):62-7. doi: 10.5500/wjt.v3.i4.62.

本文引用的文献

Identification and Quantification of Abundant Species from Pyrosequences of 16S rRNA by Consensus Alignment.通过一致性比对从16S rRNA焦磷酸测序中鉴定和定量丰富物种

Proceedings (IEEE Int Conf Bioinformatics Biomed). 2011 Feb 4;2010:153-157. doi: 10.1109/BIBM.2010.5706555.

Complete genome sequence of Treponema succinifaciens type strain (6091).琥珀酸密螺旋体模式菌株（6091）的全基因组序列

Stand Genomic Sci. 2011 Jul 1;4(3):361-70. doi: 10.4056/sigs.1984594. Epub 2011 Jun 30.

Enterotypes of the human gut microbiome.人类肠道微生物组的肠型。

Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.利用大规模平行测序数据生成高质量的哺乳动物基因组草图组装。

Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27.

Metagenomics: Facts and Artifacts, and Computational Challenges*.宏基因组学：事实与假象以及计算挑战*

J Comput Sci Technol. 2009 Jan;25(1):71-81. doi: 10.1007/s11390-010-9306-4.

Small variable segments constitute a major type of diversity of bacterial genomes at the species level.小的可变片段构成了细菌基因组在物种水平上多样性的主要类型。

Genome Biol. 2010;11(4):R45. doi: 10.1186/gb-2010-11-4-r45. Epub 2010 Apr 30.

De novo assembly of human genomes with massively parallel short read sequencing.利用大规模平行短读测序进行人类基因组从头组装。

Genome Res. 2010 Feb;20(2):265-72. doi: 10.1101/gr.097261.109. Epub 2009 Dec 17.

The NIH Human Microbiome Project.美国国立卫生研究院人类微生物组计划。

Genome Res. 2009 Dec;19(12):2317-23. doi: 10.1101/gr.096651.109. Epub 2009 Oct 9.

Simultaneous alignment of short reads against multiple genomes.同时将短读段比对到多个基因组上。

Genome Biol. 2009;10(9):R98. doi: 10.1186/gb-2009-10-9-r98. Epub 2009 Sep 17.

Personalized copy number and segmental duplication maps using next-generation sequencing.使用下一代测序技术构建个性化拷贝数和片段重复图谱。

Nat Genet. 2009 Oct;41(10):1061-7. doi: 10.1038/ng.437. Epub 2009 Aug 30.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验