Jeon S-O, Bunge J, Stoeck T, Barger K J-A, Hong S-H, Epstein S S
Department of Biology, Northeastern University, Boston, MA 02115, USA.
Appl Environ Microbiol. 2006 Oct;72(10):6578-83. doi: 10.1128/AEM.00787-06.
Molecular surveys suggest that communities of microbial eukaryotes are remarkably rich, because even large clone libraries seem to capture only a minority of species. This provides a qualitative picture of protistan richness but does not measure its real extent either locally or globally. Statistical analysis can estimate a community's richness, but the specific methods used to date are not always well grounded in statistical theory. Here we study a large protistan molecular survey from an anoxic water column in the Cariaco Basin (Caribbean Sea). We group individual 18S rRNA gene sequences into operational taxonomic units (OTUs) using different cutoff values for sequence similarity (99 to 50%) and systematically apply parametric models and nonparametric estimators to the OTU frequency data to estimate the total protistan diversity. The parametric models provided statistically sound estimates of protistan richness, with biologically meaningful standard errors, maximal data usage, and extensive model diagnostics and were preferable to the available nonparametric tools. Our clone library exceeded 700 clones but still covered only a minority of species and less than half of the larger protistan clades. Our estimates of total protistan richness portray the target community as very rich at all OTU levels, with hundreds of different populations apparently co-occurring in the small (3-liter) volume of our sample, as well as dozens of clades of the highest taxonomic order. These estimates are among the first for microbial eukaryotes that are obtained using state-of-the-art statistical methods and can serve as benchmark numbers for the local diversity of protists.
分子调查表明,微生物真核生物群落极为丰富,因为即便大型克隆文库似乎也只能捕获少数物种。这提供了原生生物丰富度的定性情况,但无论是在局部还是全球范围内,都未衡量其实际程度。统计分析可以估计群落的丰富度,但迄今为止所使用的具体方法并非总能很好地基于统计理论。在此,我们研究了来自加勒比海卡里亚科盆地缺氧水柱的一项大型原生生物分子调查。我们使用不同的序列相似性截止值(99%至50%)将单个18S rRNA基因序列分组为操作分类单元(OTU),并系统地将参数模型和非参数估计器应用于OTU频率数据,以估计原生生物的总多样性。参数模型提供了统计学上合理的原生生物丰富度估计值,具有生物学意义上的标准误差、最大的数据利用率以及广泛的模型诊断,并且比现有的非参数工具更可取。我们的克隆文库超过700个克隆,但仍然只涵盖了少数物种,以及不到一半的较大原生生物分支。我们对原生生物总丰富度的估计表明,目标群落在所有OTU水平上都非常丰富,在我们样本的小体积(3升)中显然同时存在数百个不同的种群,以及数十个最高分类阶元的分支。这些估计是使用最先进的统计方法首次获得的关于微生物真核生物的估计,可作为原生生物局部多样性的基准数据。