Suppr超能文献

一种用于量化微生物群落中密切相关基因组的德布鲁因图方法。

A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community.

作者信息

Wang Mingjie, Ye Yuzhen, Tang Haixu

机构信息

School of Informatics and Computing, Indiana University, Bloomington, IN 47405, USA.

出版信息

J Comput Biol. 2012 Jun;19(6):814-25. doi: 10.1089/cmb.2012.0058.

Abstract

The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e.g., within the same genus) or strains (e.g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).

摘要

下一代测序(NGS)技术在宏基因组学中的广泛应用带来了许多计算挑战。宏基因组学中的一个基本问题是估计微生物群落的分类组成,这可以通过将从群落中获得的鸟枪法 reads 映射到先前已表征的微生物基因组,然后根据映射 reads 的数量对这些物种进行定量分析来实现。然而,这个过程并不像乍看起来那么简单。鸟枪法宏基因组数据集通常包含来自许多密切相关的微生物物种(例如,同一属内)或菌株(例如,同一物种内)的 DNA 序列,因此当一个特定的 read 可以以高相似性映射到多个基因组共享的共同区域时,通常很难确定它是从哪个物种/菌株中采样的。此外,在同一物种内的个体基因组之间观察到高度的基因组变异,在 reads 映射过程中很难将其与种间变异区分开来。为了解决这些问题,一种常用的方法是仅基于映射到同一属的所有物种的 reads 来在属水平上定量分类分布;或者,将 reads 映射到一组代表性基因组,每个基因组被选来代表一个不同的属。在这里,我们介绍一种新的方法,通过将 reads 映射到由 de Bruijn 图表示的它们的基因组来估计同一属内密切相关物种的数量,其中它们之间的共同基因组区域被压缩。使用模拟和真实的宏基因组数据集,我们表明 de Bruijn 图方法相对于现有方法有几个优点,包括(1)它避免了鸟枪法 reads 对不同基因组中共同区域的多个副本的冗余映射,以及(2)它对密切相关物种(甚至同一物种内的菌株)导致更准确的定量。

相似文献

3
4
Evaluation of short read metagenomic assembly.短读宏基因组组装评估。
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S8. doi: 10.1186/1471-2164-12-S2-S8. Epub 2011 Jul 27.
5
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

引用本文的文献

4
Read mapping on de Bruijn graphs.在德布鲁因图上进行读段映射。
BMC Bioinformatics. 2016 Jun 16;17(1):237. doi: 10.1186/s12859-016-1103-9.

本文引用的文献

3
Enterotypes of the human gut microbiome.人类肠道微生物组的肠型。
Nature. 2011 May 12;473(7346):174-80. doi: 10.1038/nature09944. Epub 2011 Apr 20.
8
The NIH Human Microbiome Project.美国国立卫生研究院人类微生物组计划。
Genome Res. 2009 Dec;19(12):2317-23. doi: 10.1101/gr.096651.109. Epub 2009 Oct 9.
9
Simultaneous alignment of short reads against multiple genomes.同时将短读段比对到多个基因组上。
Genome Biol. 2009;10(9):R98. doi: 10.1186/gb-2009-10-9-r98. Epub 2009 Sep 17.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验