Franzén Oscar, Hu Jianzhong, Bao Xiuliang, Itzkowitz Steven H, Peter Inga, Bashir Ali
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Division of Gastroenterology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Microbiome. 2015 Oct 5;3:43. doi: 10.1186/s40168-015-0105-6.
High-throughput bacterial 16S rRNA gene sequencing followed by clustering of short sequences into operational taxonomic units (OTUs) is widely used for microbiome profiling. However, clustering of short 16S rRNA gene reads into biologically meaningful OTUs is challenging, in part because nucleotide variation along the 16S rRNA gene is only partially captured by short reads. The recent emergence of long-read platforms, such as single-molecule real-time (SMRT) sequencing from Pacific Biosciences, offers the potential for improved taxonomic and phylogenetic profiling. Here, we evaluate the performance of long- and short-read 16S rRNA gene sequencing using simulated and experimental data, followed by OTU inference using computational pipelines based on heuristic and complete-linkage hierarchical clustering.
In simulated data, long-read sequencing was shown to improve OTU quality and decrease variance. We then profiled 40 human gut microbiome samples using a combination of Illumina MiSeq and Blautia-specific SMRT sequencing, further supporting the notion that long reads can identify additional OTUs. We implemented a complete-linkage hierarchical clustering strategy using a flexible computational pipeline, tailored specifically for PacBio circular consensus sequencing (CCS) data that outperforms heuristic methods in most settings: https://github.com/oscar-franzen/oclust/ .
Our data demonstrate that long reads can improve OTU inference; however, the choice of clustering algorithm and associated clustering thresholds has significant impact on performance.
高通量细菌16S rRNA基因测序,随后将短序列聚类为操作分类单元(OTU),被广泛用于微生物组分析。然而,将短的16S rRNA基因读数聚类为具有生物学意义的OTU具有挑战性,部分原因是短读数仅部分捕获了16S rRNA基因上的核苷酸变异。最近出现的长读长平台,如太平洋生物科学公司的单分子实时(SMRT)测序,为改进分类学和系统发育分析提供了潜力。在这里,我们使用模拟和实验数据评估长读长和短读长16S rRNA基因测序的性能,随后使用基于启发式和完全连锁层次聚类的计算管道进行OTU推断。
在模拟数据中,长读长测序显示可提高OTU质量并降低方差。然后,我们结合Illumina MiSeq和针对Blautia的SMRT测序对40个人类肠道微生物组样本进行了分析,进一步支持了长读长可以识别更多OTU的观点。我们使用灵活的计算管道实施了完全连锁层次聚类策略,该管道专门针对PacBio环形一致序列(CCS)数据进行了定制,在大多数情况下优于启发式方法:https://github.com/oscar-franzen/oclust/ 。
我们的数据表明长读长可以改善OTU推断;然而,聚类算法和相关聚类阈值的选择对性能有重大影响。