Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA.
ISME J. 2012 Jan;6(1):183-94. doi: 10.1038/ismej.2011.74. Epub 2011 Jun 16.
Microbes commonly exist in milieus of varying complexity and diversity. Although cultivation-based techniques have been unable to accurately capture the true diversity within microbial communities, these deficiencies have been overcome by applying molecular approaches that target the universally conserved 16S ribosomal RNA gene. The recent application of 454 pyrosequencing to simultaneously sequence thousands of 16S rDNA sequences (pyrotags) has revolutionized the characterization of complex microbial communities. To date, studies based on 454 pyrotags have dominated the field, but sequencing platforms that generate many more sequence reads at much lower costs have been developed. Here, we use the Illumina sequencing platform to design a strategy for 16S amplicon analysis (iTags), and assess its generality, practicality and potential complications. We fabricated and sequenced paired-end libraries of amplified hyper-variable 16S rDNA fragments from sets of samples that varied in their contents, ranging from a single bacterium to highly complex communities. We adopted an approach that allowed us to evaluate several potential sources of errors, including sequencing artifacts, amplification biases, non-corresponding paired-end reads and mistakes in taxonomic classification. By considering each source of error, we delineate ways to make biologically relevant and robust conclusions from the millions of sequencing reads that can be readily generated by this technology.
微生物通常存在于不同复杂程度和多样性的环境中。尽管基于培养的技术无法准确捕捉微生物群落中的真实多样性,但通过应用靶向普遍保守的 16S 核糖体 RNA 基因的分子方法,这些缺陷已得到克服。最近,454 焦磷酸测序技术的应用使得同时对数千个 16S rDNA 序列(焦磷酸标签)进行测序成为可能,从而彻底改变了复杂微生物群落的表征。迄今为止,基于 454 焦磷酸标签的研究占据主导地位,但已经开发出了能够以更低成本生成更多序列读数的测序平台。在这里,我们使用 Illumina 测序平台设计了一种用于 16S 扩增子分析(iTags)的策略,并评估了其通用性、实用性和潜在的复杂性。我们从内容各异的样本中构建和测序了扩增的高变 16S rDNA 片段的配对末端文库,样本范围从单个细菌到高度复杂的群落。我们采用了一种方法,使我们能够评估几个潜在的误差源,包括测序伪影、扩增偏差、不对应的配对末端读取和分类学分类中的错误。通过考虑每个误差源,我们阐明了如何从这项技术可以轻易生成的数百万个测序读取中得出具有生物学意义和稳健的结论的方法。