Schloss Patrick D, Handelsman Jo
Department of Plant Pathology, University of Wisconsin-Madison, 1630 Linden Dr., Madison, WI 53706, USA.
Appl Environ Microbiol. 2005 Mar;71(3):1501-6. doi: 10.1128/AEM.71.3.1501-1506.2005.
Although copious qualitative information describes the members of the diverse microbial communities on Earth, statistical approaches for quantifying and comparing the numbers and compositions of lineages in communities are lacking. We present a method that addresses the challenge of assigning sequences to operational taxonomic units (OTUs) based on the genetic distances between sequences. We developed a computer program, DOTUR, which assigns sequences to OTUs by using either the furthest, average, or nearest neighbor algorithm for each distance level. DOTUR uses the frequency at which each OTU is observed to construct rarefaction and collector's curves for various measures of richness and diversity. We analyzed 16S rRNA gene libraries derived from Scottish and Amazonian soils and the Sargasso Sea with DOTUR, which assigned sequences to OTUs rapidly and reliably based on the genetic distances between sequences and identified previous inconsistencies and errors in assigning sequences to OTUs. An analysis of the two 16S rRNA gene libraries from soil demonstrated that they do not contain enough sequences to support a claim that they contain different numbers of bacterial lineages with statistical confidence (P > 0.05), nor do they contain enough sequences to provide a robust estimate of species richness when an OTU is defined as containing sequences that are no more than 3% different from each other. In contrast, the richness of OTUs at the 3% level in the Sargasso Sea collection began to plateau after the sampling of 690 sequences. We anticipate that an equivalent extent of sampling for soil would require sampling more than 10,000 sequences, almost 100 times the size of typical sequence collections obtained from soil.
尽管有大量定性信息描述了地球上各种微生物群落的成员,但缺乏用于量化和比较群落中谱系数量和组成的统计方法。我们提出了一种方法,该方法解决了基于序列间遗传距离将序列分配到操作分类单元(OTU)的挑战。我们开发了一个计算机程序DOTUR,它通过对每个距离水平使用最远、平均或最近邻算法将序列分配到OTU。DOTUR利用观察到的每个OTU的频率来构建用于各种丰富度和多样性测量的稀疏曲线和累积曲线。我们使用DOTUR分析了源自苏格兰和亚马逊土壤以及马尾藻海的16S rRNA基因文库,该程序基于序列间的遗传距离快速可靠地将序列分配到OTU,并识别了之前在将序列分配到OTU时的不一致和错误。对来自土壤的两个16S rRNA基因文库的分析表明,它们没有包含足够的序列来支持有统计学置信度(P>0.05)的关于它们包含不同数量细菌谱系的说法,当将一个OTU定义为包含彼此差异不超过3%的序列时,它们也没有包含足够的序列来提供对物种丰富度的可靠估计。相比之下,在对690个序列进行采样后,马尾藻海样本中3%水平的OTU丰富度开始趋于平稳。我们预计,对土壤进行同等程度的采样将需要对超过10000个序列进行采样,这几乎是从土壤中获得的典型序列集大小的100倍。