Institute of Ecology & Evolution, University of Oregon, Eugene, Oregon, USA.
PLoS Comput Biol. 2012;8(10):e1002743. doi: 10.1371/journal.pcbi.1002743. Epub 2012 Oct 25.
The abundance of different SSU rRNA ("16S") gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly - from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.
环境样本中不同的小亚基核糖体 RNA(“16S”)基因序列的丰度被广泛用于微生物生态学研究,作为微生物群落结构和多样性的衡量标准。然而,16S 基因的基因组拷贝数差异很大——从许多物种中的一个到某些细菌中的 15 个,再到某些微生物真核生物中的数百个。由于这种变化,环境样本中 16S 基因的相对丰度既可以归因于不同生物体相对丰度的变化,也可以归因于这些生物体中基因组 16S 拷贝数的变化。尽管如此,许多研究仍假设 16S 基因序列的丰度是包含这些序列的生物体相对丰度的替代衡量标准。在这里,我们提出了一种方法,该方法使用 16S 基因序列和基因组拷贝数的数据以及系统发育定位和祖先状态估计,从环境 DNA 序列数据中估计生物体的丰度。我们使用理论和模拟来证明,可以从高通量环境 16S 基因测序通常获得的短读序列中准确估计 16S 基因组拷贝数,并且微生物群落中生物体的丰度与我们方法获得的估计丰度的相关性比与基因丰度的相关性更强。我们重新分析了几个已发表的经验数据集,并证明了使用基因丰度与估计的生物体丰度可以导致对群落多样性和结构以及微生物群落中主要分类群的身份的不同推断。我们的方法将使微生物生态学家能够根据 16S 序列数据更准确地推断微生物多样性和丰度。