State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, Beijing, China.
mBio. 2013 Jun 11;4(3):e00324-13. doi: 10.1128/mBio.00324-13.
The site-to-site variability in species composition, known as β-diversity, is crucial to understanding spatiotemporal patterns of species diversity and the mechanisms controlling community composition and structure. However, quantifying β-diversity in microbial ecology using sequencing-based technologies is a great challenge because of a high number of sequencing errors, bias, and poor reproducibility and quantification. Herein, based on general sampling theory, a mathematical framework is first developed for simulating the effects of random sampling processes on quantifying β-diversity when the community size is known or unknown. Also, using an analogous ball example under Poisson sampling with limited sampling efforts, the developed mathematical framework can exactly predict the low reproducibility among technically replicate samples from the same community of a certain species abundance distribution, which provides explicit evidences of random sampling processes as the main factor causing high percentages of technical variations. In addition, the predicted values under Poisson random sampling were highly consistent with the observed low percentages of operational taxonomic unit (OTU) overlap (<30% and <20% for two and three tags, respectively, based on both Jaccard and Bray-Curtis dissimilarity indexes), further supporting the hypothesis that the poor reproducibility among technical replicates is due to the artifacts associated with random sampling processes. Finally, a mathematical framework was developed for predicting sampling efforts to achieve a desired overlap among replicate samples. Our modeling simulations predict that several orders of magnitude more sequencing efforts are needed to achieve desired high technical reproducibility. These results suggest that great caution needs to be taken in quantifying and interpreting β-diversity for microbial community analysis using next-generation sequencing technologies. IMPORTANCE Due to the vast diversity and uncultivated status of the majority of microorganisms, microbial detection, characterization, and quantitation are of great challenge. Although large-scale metagenome sequencing technology such as PCR-based amplicon sequencing has revolutionized the studies of microbial communities, it suffers from several inherent drawbacks, such as a high number of sequencing errors, biases, poor quantitation, and very high percentages of technical variations, which could greatly overestimate microbial biodiversity. Based on general sampling theory, this study provided the first explicit evidence to demonstrate the importance of random sampling processes in estimating microbial β-diversity, which has not been adequately recognized and addressed in microbial ecology. Since most ecological studies are involved in random sampling, the conclusions learned from this study should also be applicable to other ecological studies in general. In summary, the results presented in this study should have important implications for examining microbial biodiversity to address both basic theoretical and applied management questions.
种间组成的站点间可变性,即β多样性,对于理解物种多样性的时空格局以及控制群落组成和结构的机制至关重要。然而,使用基于测序的技术量化微生物生态学中的β多样性是一个巨大的挑战,因为测序错误、偏差以及较差的再现性和定量性数量众多。在此,基于一般抽样理论,首先为模拟已知或未知群落大小时随机抽样过程对量化β多样性的影响,开发了一个数学框架。此外,使用具有有限抽样努力的泊松抽样下的类似球的例子,开发的数学框架可以准确预测来自同一群落的同一种群丰度分布的技术重复样本之间的低再现性,这为随机抽样过程作为导致高比例技术变异的主要因素提供了明确的证据。此外,泊松随机抽样下的预测值与观察到的操作分类单元(OTU)重叠率非常低(基于 Jaccard 和 Bray-Curtis 不相似性指数,两个和三个标签分别为<30%和<20%)非常一致,进一步支持了技术重复之间的低再现性是由于与随机抽样过程相关的伪影这一假设。最后,开发了一个数学框架来预测达到期望的重复样本重叠所需的抽样努力。我们的建模模拟预测,为了实现所需的高技术再现性,需要进行几个数量级的更多测序工作。这些结果表明,在使用下一代测序技术分析微生物群落时,量化和解释β多样性需要非常谨慎。
由于大多数微生物的多样性和未培养状态,微生物的检测、特征描述和定量是一个巨大的挑战。尽管基于 PCR 的扩增子测序等大规模宏基因组测序技术已经彻底改变了微生物群落的研究,但它存在一些固有缺陷,例如测序错误数量多、存在偏差、定量性差、技术变异比例非常高,这可能会极大地高估微生物生物多样性。基于一般抽样理论,本研究首次提供了明确的证据,证明随机抽样过程在估计微生物β多样性方面的重要性,这在微生物生态学中尚未得到充分的认识和解决。由于大多数生态研究都涉及随机抽样,因此从这项研究中得出的结论也应该适用于一般的其他生态研究。总之,本研究的结果对于检查微生物生物多样性以解决基础理论和应用管理问题都具有重要意义。