Suppr超能文献

无与伦比3:宏基因组覆盖度和序列多样性的快速估计

Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.

作者信息

Rodriguez-R Luis M, Gunturu Santosh, Tiedje James M, Cole James R, Konstantinidis Konstantinos T

机构信息

School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.

Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA.

出版信息

mSystems. 2018 Apr 10;3(3). doi: 10.1128/mSystems.00039-18. eCollection 2018 May-Jun.

Abstract

Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

摘要

基于宏基因组数据集对微生物群落多样性的估计,常常受到因覆盖不足和依赖参考数据库的多样性估计所产生的偏差影响,且这种影响程度往往未知。例如,参考数据库的完整性通常无法估计,因为它取决于迄今所采样的现存多样性,除了少数栖息地(如人类肠道)外,其余大部分栖息地的采样仍然严重不足。此外,对于大型数据集而言,通过宏基因组数据集估计微生物群落的覆盖程度极为耗时,并且不同测序技术获得的数据集之间的覆盖值可能无法直接比较。在此,我们将Nonpareil(一种用于估计宏基因组数据集中覆盖度的独立于数据库的工具)扩展为一种高性能计算实现方式,它可以扩展到数百个核心,此外,还包括一种基于k-mer的估计方法,其灵敏度与原始基于比对的版本相同,但速度快约三百倍。此外,我们提出了一种直接从Nonpareil曲线导出的序列多样性度量指标(D_rarefied),它与传统度量方法评估的α多样性具有良好的相关性。我们在不同实验中使用该指标,证明其与基于16S rRNA基因谱估计的香农指数相关,并表明D_rarefied还揭示了海洋样本中香农指数未捕捉到的季节性模式,以及不同栖息地微生物群落多样性大小的更精确排名。因此,新版本的Nonpareil(称为Nonpareil 3)推进了微生物群落宏基因组分析的工具集。估计宏基因组数据集提供的覆盖度,即通过DNA测序对微生物群落采样的比例,是每项旨在稳健评估样本中存在的序列多样性的非培养基因组研究的重要第一步。然而,由于与高计算要求相关的几个技术限制以及用于量化多样性的有限统计方法,覆盖度的估计仍然难以实现。在此我们描述了Nonpareil 3,一种新的生物信息学算法,它规避了其中的几个限制,因此可以促进临床或环境环境中的非培养研究,而与所采用的测序平台无关。此外,我们提出了一种基于稀疏覆盖度的新的序列多样性度量指标,并展示了其在来自不同生态系统的群落中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c3d/5893860/bd06d1ec0d8e/sys0031822250001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验