Segura Daniel, Sharma Divya, Espin-Garcia Osvaldo
Department of Epidemiology and Biostatistics, University of Western Ontario, London, Ontario, Canada.
Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada.
PLoS One. 2024 Dec 30;19(12):e0315720. doi: 10.1371/journal.pone.0315720. eCollection 2024.
The microbiome is increasingly regarded as a key component of human health, and analysis of microbiome data can aid in the development of precision medicine. Due to the high cost of shotgun metagenomic sequencing (SM-seq), microbiome analyses can be done cost-effectively in two phases: Phase 1-sequencing of 16S ribosomal RNA, and Phase 2-SM-seq of an informative subsample. Existing research suggests strategies to select the subsample based on biological diversity and dissimilarity metrics calculated using operational taxonomic units (OTUs). However, the microbiome field has progressed towards amplicon sequencing variants (ASVs), as they provide more precise microbe identification and sample diversity information. The aim of this work is to compare the subsampling strategies for two-phase metagenomic studies when using ASVs instead of OTUs, and to propose data driven strategies for subsample selection through dimension reduction techniques. We used 199 samples of infant-gut microbiome data from the DIABIMMUNE project to generate ASVs and OTUs, then generated subsamples based on five existing biologically driven subsampling methods and two data driven methods. Linear discriminant analysis Effect Size (LEfSe) was used to assess differential representation of taxa between the subsamples and the overall sample. The use of ASVs showed a 50-93% agreement in the subsample selection with the use of OTUs for the subsampling methods evaluated, and showed a similar bacterial representation across all methods. Although sampling using ASVs and OTUs typically lead to similar results for each subsample, ASVs had more clades that differed in expression levels between allergic and non-allergic individuals across all sample sizes compared to OTUs, and led to more biomarkers discovered at Phase 2-SM-seq level.
微生物组越来越被视为人类健康的关键组成部分,对微生物组数据的分析有助于精准医学的发展。由于鸟枪法宏基因组测序(SM-seq)成本高昂,微生物组分析可以分两个阶段经济高效地完成:第一阶段——16S核糖体RNA测序,第二阶段——对一个信息丰富的子样本进行SM-seq。现有研究提出了基于使用操作分类单元(OTU)计算的生物多样性和差异度量来选择子样本的策略。然而,微生物组领域已朝着扩增子测序变体(ASV)发展,因为它们能提供更精确的微生物鉴定和样本多样性信息。这项工作的目的是比较在使用ASV而非OTU时两阶段宏基因组研究的子采样策略,并通过降维技术提出数据驱动的子样本选择策略。我们使用了来自DIABIMMUNE项目的199份婴儿肠道微生物组数据样本生成ASV和OTU,然后基于五种现有的生物驱动子采样方法和两种数据驱动方法生成子样本。使用线性判别分析效应大小(LEfSe)来评估子样本与总体样本之间分类群的差异表示。对于所评估的子采样方法,使用ASV显示在子样本选择上与使用OTU有50%-93%的一致性,并且在所有方法中显示出相似的细菌表示。尽管使用ASV和OTU进行采样通常会为每个子样本带来相似的结果,但与OTU相比,在所有样本大小下,ASV在过敏和非过敏个体之间具有更多表达水平不同的进化枝,并且在第二阶段SM-seq水平上发现了更多生物标志物。