Forry Samuel P, Servetas Stephanie L, Dootz Jennifer N, Hunter Monique E, Kralj Jason G, Filliben James J, Jackson Scott A
Complex Microbial Systems Group, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA.
Multimodal Information Group, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA.
Microbiol Spectr. 2025 Feb 4;13(2):e0069624. doi: 10.1128/spectrum.00696-24. Epub 2025 Jan 14.
The experimental methods employed during metagenomic sequencing analyses of microbiome samples significantly impact the resulting data and typically vary substantially between laboratories. In this study, a full factorial experimental design was used to compare the effects of a select set of methodological choices (sample, operator, lot, extraction kit, variable region, and reference database) on the analysis of biologically diverse stool samples. For each parameter investigated, a main effect was calculated that allowed direct comparison both between methodological choices (bias effects) and between samples (real biological differences). Overall, methodological bias was found to be similar in magnitude to real biological differences while also exhibiting significant variations between individual taxa, even between closely related genera. The quantified method biases were then used to computationally improve the comparability of data sets collected under substantially different protocols. This investigation demonstrates a framework for quantitatively assessing methodological choices that could be routinely performed by individual laboratories to better understand their metagenomic sequencing workflows and to improve the scope of the datasets they produce.IMPORTANCEMethod-specific bias is a well-recognized challenge in metagenomic sequencing characterization of microbiome samples, but rigorous bias quantification is challenging. This report details a full factorial exploration of 48 experimental protocols by systematically varying microbiome sample, iterations of material production, laboratory personnel, DNA extraction kit, marker gene selection, and reference databases. Quantification of the biases associated with each parameter revealed similar magnitudes of variation arising from real biological differences and from varied analysis procedures. Furthermore, these measurement biases varied substantially with taxa, even between closely related genera. However, computational correction of method bias using a reference material was demonstrated that significantly harmonized metagenomic sequencing results collected using different analysis protocols.
微生物组样本宏基因组测序分析中采用的实验方法会对所得数据产生重大影响,并且不同实验室之间通常差异很大。在本研究中,采用全因子实验设计来比较一组特定方法选择(样本、操作人员、批次、提取试剂盒、可变区和参考数据库)对生物多样性粪便样本分析的影响。对于所研究的每个参数,计算了主效应,以便能够在方法选择之间(偏差效应)和样本之间(实际生物学差异)进行直接比较。总体而言,发现方法偏差在大小上与实际生物学差异相似,同时在各个分类群之间也表现出显著差异,即使是在密切相关的属之间。然后,使用量化的方法偏差通过计算提高在截然不同的实验方案下收集的数据集的可比性。这项研究展示了一个用于定量评估方法选择的框架,各个实验室可以常规执行该框架,以更好地理解其宏基因组测序工作流程,并扩大他们所产生数据集的范围。
重要性
方法特异性偏差是微生物组样本宏基因组测序表征中一个公认的挑战,但严格的偏差量化具有挑战性。本报告详细介绍了通过系统改变微生物组样本、材料生产迭代、实验室人员、DNA提取试剂盒、标记基因选择和参考数据库对48种实验方案进行的全因子探索。对与每个参数相关的偏差进行量化后发现,实际生物学差异和不同分析程序产生的变异程度相似。此外,这些测量偏差在不同分类群之间差异很大,即使是在密切相关的属之间。然而,已证明使用参考材料对方法偏差进行计算校正可显著协调使用不同分析方案收集的宏基因组测序结果。