Centre for Biodiversity Theory and Modelling, Experimental Ecology Station, Centre National de Recherche Scientifique, Moulis, France.
ISME J. 2013 Jun;7(6):1092-101. doi: 10.1038/ismej.2013.10. Epub 2013 Feb 14.
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao's estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics ('Hill diversities'), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao's estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.
量化多样性对于研究微生物群落的结构、功能和进化至关重要。随着大规模宏基因组研究的出现,微生物多样性的估计受到了新的关注。在这里,我们考虑了样本中观察到的多样性可以告诉我们关于正在采样的群落的多样性。首先,我们认为,如果不对物种丰度分布做出无根据的假设,就无法可靠地估计群落中存在的微生物物种的绝对和相对数量。原因是样本数据不包含关于物种丰度分布尾部稀有物种数量的信息。我们通过将 Chao 的物种丰富度估计应用于一组计算机模拟群落来说明比较物种丰富度估计的困难:在存在大量稀有物种的情况下,它们的排序是不正确的。接下来,我们将我们的分析扩展到一组多样性指标(“Hill 多样性”),并构建与样本数据一致的多样性值的下限和上限估计。该理论推广了 Chao 的估计器,我们将其作为物种丰富度的下限估计。我们表明 Shannon 和 Simpson 多样性可以为计算机模拟群落进行稳健估计。我们分析了来自广泛环境的九个宏基因组数据集,并表明我们的发现与经验采样群落相关。因此,我们建议在量化和比较微生物多样性时使用 Shannon 和 Simpson 多样性而不是物种丰富度。