Department of Data Science, The Dana-Farber Cancer Institute, Boston, MA, USA.
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Genome Biol. 2022 Aug 1;23(1):166. doi: 10.1186/s13059-022-02722-x.
Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. Thus, deriving health indicators based on microbiome diversity measures is essential. While microbiome data generated using high-throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa.
When this artificial inflation in the observed number of taxa is ignored, we find that changes in the abundance of detected taxa confound current methods for inferring differences in richness. Experimental evidence, theory-guided exploratory data analyses, and existing literature support the conclusion that most sub-genus discoveries are spurious artifacts of clustering 16S sequencing reads. We proceed to model a 16S survey's systematic patterns of sub-genus taxa generation as a function of genus abundance to derive a robust control for false taxa accumulation. These controls unlock classical regression approaches for highly flexible differential richness inference at various levels of the surveyed microbial assemblage: from sample groups to specific taxa collections. The proposed methodology for differential richness inference is available through an R package, Prokounter.
False species discoveries bias richness estimation and confound differential richness inference. In the case of 16S microbiome surveys, supporting evidence indicate that most sub-genus taxa are spurious. Based on this finding, a flexible method is proposed and is shown to overcome the confounding problem noted with current approaches for differential richness inference. Package availability: https://github.com/mskb01/prokounter.
个体和环境健康结果经常与相关微生物群落多样性的变化有关。因此,基于微生物组多样性测量得出健康指标是至关重要的。虽然使用高通量 16S rRNA 标记基因调查生成的微生物组数据对此很有吸引力,但 16S 调查也会产生大量虚假的微生物分类群。
当忽略这种观察到的分类群数量的人为膨胀时,我们发现检测到的分类群丰度的变化混淆了当前推断丰富度差异的方法。实验证据、理论指导的探索性数据分析和现有文献支持这样的结论,即大多数亚属的发现都是聚类 16S 测序reads 的虚假人工制品。我们继续将 16S 调查中分类群产生的系统模式建模为属丰度的函数,以得出一个稳健的控制虚假分类群积累的方法。这些控制方法为在调查微生物组合的各个层次上进行高度灵活的差异丰富度推断解锁了经典回归方法:从样本组到特定的分类群集合。用于差异丰富度推断的提议方法可通过 R 包 Prokounter 获得。
虚假物种的发现会影响丰富度估计,并混淆差异丰富度推断。在 16S 微生物组调查的情况下,支持性证据表明大多数亚属分类群是虚假的。基于这一发现,提出了一种灵活的方法,并证明该方法能够克服当前差异丰富度推断方法中存在的混淆问题。包可用性:https://github.com/mskb01/prokounter。