Jones Carli B, White James R, Ernst Sarah E, Sfanos Karen S, Peiffer Lauren B
Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States.
Resphera Biosciences, Baltimore, MD, United States.
Front Genet. 2022 Mar 31;13:799615. doi: 10.3389/fgene.2022.799615. eCollection 2022.
Short read 16 S rRNA amplicon sequencing is a common technique used in microbiome research. However, inaccuracies in estimated bacterial community composition can occur due to amplification bias of the targeted hypervariable region. A potential solution is to sequence and assess multiple hypervariable regions in tandem, yet there is currently no consensus as to the appropriate method for analyzing this data. Additionally, there are many sequence analysis resources for data produced from the Illumina platform, but fewer open-source options available for data from the Ion Torrent platform. Herein, we present an analysis pipeline using open-source analysis platforms that integrates data from multiple hypervariable regions and is compatible with data produced from the Ion Torrent platform. We used the ThermoFisher Ion 16 S Metagenomics Kit and a mock community of twenty bacterial strains to assess taxonomic classification of six amplicons from separate hypervariable regions (V2, V3, V4, V6-7, V8, V9) using our analysis pipeline. We report that different amplicons have different specificities for taxonomic classification, which also has implications for global level analyses such as alpha and beta diversity. Finally, we utilize a generalized linear modeling approach to statistically integrate the results from multiple hypervariable regions and apply this methodology to data from a representative clinical cohort. We conclude that examining sequencing results across multiple hypervariable regions provides more taxonomic information than sequencing across a single region. The data across multiple hypervariable regions can be combined using generalized linear models to enhance the statistical evaluation of overall differences in community structure and relatedness among sample groups.
短读长16S rRNA扩增子测序是微生物组研究中常用的技术。然而,由于目标高变区的扩增偏差,可能会出现估计细菌群落组成的不准确情况。一种潜在的解决方案是串联测序和评估多个高变区,但目前对于分析此数据的合适方法尚无共识。此外,有许多针对Illumina平台产生的数据的序列分析资源,但针对Ion Torrent平台数据的开源选项较少。在此,我们展示了一种使用开源分析平台的分析流程,该流程整合了来自多个高变区的数据,并且与Ion Torrent平台产生的数据兼容。我们使用赛默飞世尔Ion 16S宏基因组学试剂盒和一个包含20种细菌菌株的模拟群落,通过我们的分析流程评估来自不同高变区(V2、V3、V4、V6 - 7、V8、V9)的六个扩增子的分类学分类。我们报告不同的扩增子在分类学分类上具有不同的特异性,这对诸如α和β多样性等全局水平分析也有影响。最后,我们利用广义线性建模方法对来自多个高变区的结果进行统计整合,并将此方法应用于来自一个代表性临床队列的数据。我们得出结论,与对单个区域进行测序相比,对多个高变区的测序结果进行检查可提供更多的分类学信息。来自多个高变区的数据可以使用广义线性模型进行合并,以增强对群落结构总体差异和样本组间相关性的统计评估。