研究设计、污染及数据特征对微生物组研究结果与解读的影响

Impact of study design, contamination, and data characteristics on results and interpretation of microbiome studies.

作者信息

Agudelo Jose, Miller Aaron W

机构信息

Department of Cardiovascular and Metabolic Sciences, Cleveland Clinic, Cleveland, Ohio, USA.

Department of Urology, Glickman Urological and Kidney Institute, Cleveland Clinic, Cleveland, Ohio, USA.

出版信息

mSystems. 2025 Aug 6:e0040825. doi: 10.1128/msystems.00408-25.

Advances in high-throughput molecular techniques have enabled microbiome studies in low-biomass environments, which pose unique challenges due to contamination risks. While best-practice guidelines can reduce contamination by over 90%, the impact of residual contamination and data set variability on statistical outcomes remains understudied. Here, we quantitatively assessed how study design factors influence microbiome analyses using simulated and real-world data sets. Alpha diversity was affected by sample number and community dissimilarity, but not by the number of unique taxa. Beta diversity was influenced primarily by unique taxa and group dissimilarity, with a marginal effect of sample number. The number of differentially abundant taxa depended on the number of unique taxa but was also influenced by sample number, depending on thealgorithm. Notably, contamination had a marginal impact on weighted beta diversity but altered the number of differentially abundant taxa when at least 10 contaminants were present, with a greater effect as contamination increased. Findings closely mirrored results from seven real-world low-biomass data sets. Overall, group dissimilarity and the number of unique taxa were the primary drivers of statistical outcomes. The DESeq2 algorithm outperformed ANCOM-BC when exposed to stochastically distributed contamination, but algorithms were equivocal under contamination weighted toward one group. In all cases, the rate of false positives in differential abundance analyses was <15%. Importantly, in both simulated and real-world data, contamination rarely impacts whether microbiome differences were detected but did affect the number of differentially abundant taxa. Thus, when validated protocols with internal negative controls are used, residual contamination minimally impacts statistical outcomes. Alpha diversity was affected by sample number and community dissimilarity, but not by the number of unique taxa. Beta diversity was influenced primarily by unique taxa and group dissimilarity, with a marginal effect of sample number. The number of differentially abundant taxa depended on the number of unique taxa but was also influenced by sample number, depending on the algorithm. Notably, contamination had a marginal impact on weighted beta diversity but altered the number of differentially abundant taxa when at least 10 contaminants were present, with a greater effect as contamination increased. Findings closely mirrored results from seven real-world low-biomass data sets. Overall, group dissimilarity and the number of unique taxa were the primary drivers of statistical outcomes. The DESeq2 algorithm outperformed ANCOM-BC when exposed to stochastically distributed contamination, but algorithms were equivocal under contamination weighted toward one group. In all cases, the rate of false positives in differential abundance analyses was <15%. Importantly, in both simulated and real-world data, contamination rarely whether microbiome differences were detected but did affect the number of differentially abundant taxa. Thus, when validated protocols with internal negative controls are used, residual contamination minimally impacts statistical outcomes.IMPORTANCEMicrobiome studies in low-biomass environments face challenges due to contamination. However, even after implementing strict contamination prevention, control, and analysis measures, the impact of residual contamination on the validity of statistical outcomes in such studies remains a topic of ongoing discussion. Our analyses reveal that key drivers of microbiome study outcomes are group dissimilarity and the number of unique taxa, while contamination has minimal impact on statistical outcomes, primarily limited to the number of differentially abundant taxa detected. A common approach to contamination control involves removing taxa based on published contaminant lists. However, our analysis shows that these lists are highly inconsistent across studies, limiting reliability. Instead, our results support the use of internal negative controls as the most robust means of identifying and mitigating contamination. Collectively, data show that low-biomass microbiome studies have reduced power to detect differences between groups. However, when differences are observed, they are unlikely to be contamination-driven. By prioritizing validated protocols that prevent, assess, and eliminate contaminants through the use of internal negative controls, researchers can minimize the impact of contamination and improve the reliability of results.

高通量分子技术的进步使得在低生物量环境中开展微生物组研究成为可能，而低生物量环境因存在污染风险带来了独特的挑战。虽然最佳实践指南可将污染降低90%以上，但残留污染和数据集变异性对统计结果的影响仍未得到充分研究。在此，我们使用模拟数据集和真实世界数据集定量评估了研究设计因素如何影响微生物组分析。α多样性受样本数量和群落差异的影响，但不受独特分类单元数量的影响。β多样性主要受独特分类单元和组间差异的影响，样本数量的影响较小。差异丰富分类单元的数量取决于独特分类单元的数量，但也受样本数量的影响，具体取决于算法。值得注意的是，污染对加权β多样性的影响较小，但当存在至少10种污染物时会改变差异丰富分类单元的数量，且随着污染增加影响更大。研究结果与7个真实世界低生物量数据集的结果非常相似。总体而言，组间差异和独特分类单元的数量是统计结果的主要驱动因素。当暴露于随机分布的污染时，DESeq2算法优于ANCOM-BC算法，但在偏向一组的污染情况下，各算法的表现难以明确区分。在所有情况下，差异丰度分析中的假阳性率均<15%。重要的是，在模拟数据和真实世界数据中，污染很少影响是否检测到微生物组差异，但确实会影响差异丰富分类单元的数量。因此，当使用带有内部阴性对照的经过验证的方案时，残留污染对统计结果的影响最小。α多样性受样本数量和群落差异的影响，但不受独特分类单元数量的影响。β多样性主要受独特分类单元和组间差异的影响，样本数量的影响较小。差异丰富分类单元的数量取决于独特分类单元的数量，但也受样本数量的影响，具体取决于算法。值得注意的是，污染对加权β多样性的影响较小，但当存在至少10种污染物时会改变差异丰富分类单元的数量，且随着污染增加影响更大。研究结果与7个真实世界低生物量数据集的结果非常相似。总体而言，组间差异和独特分类单元的数量是统计结果的主要驱动因素。当暴露于随机分布的污染时，DESeq2算法优于ANCOM-BC算法，但在偏向一组的污染情况下，各算法的表现难以明确区分。在所有情况下，差异丰度分析中的假阳性率均<15%。重要的是，在模拟数据和真实世界数据中，污染很少影响是否检测到微生物组差异，但确实会影响差异丰富分类单元的数量。因此，当使用带有内部阴性对照的经过验证的方案时，残留污染对统计结果的影响最小。

重要性

低生物量环境中的微生物组研究因污染而面临挑战。然而，即使实施了严格的污染预防、控制和分析措施，残留污染对此类研究中统计结果有效性的影响仍是一个持续讨论的话题。我们的分析表明，微生物组研究结果的关键驱动因素是组间差异和独特分类单元的数量，而污染对统计结果的影响最小，主要限于检测到的差异丰富分类单元的数量。一种常见的污染控制方法是根据已发表的污染物列表去除分类单元。然而，我们的分析表明，这些列表在不同研究中高度不一致，限制了可靠性。相反，我们的结果支持使用内部阴性对照作为识别和减轻污染的最可靠方法。总体而言，数据表明低生物量微生物组研究检测组间差异的能力有所下降。然而，当观察到差异时，它们不太可能是由污染驱动的。通过优先采用经过验证的方案，通过使用内部阴性对照来预防、评估和消除污染物，研究人员可以将污染的影响降至最低，并提高结果的可靠性。

Impact of study design, contamination, and data characteristics on results and interpretation of microbiome studies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献