Department of Computer Science, Columbia University, New York, NY, USA.
Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Nat Biotechnol. 2023 Dec;41(12):1820-1828. doi: 10.1038/s41587-023-01696-w. Epub 2023 Mar 16.
Sequencing-based approaches for the analysis of microbial communities are susceptible to contamination, which could mask biological signals or generate artifactual ones. Methods for in silico decontamination using controls are routinely used, but do not make optimal use of information shared across samples and cannot handle taxa that only partially originate in contamination or leakage of biological material into controls. Here we present Source tracking for Contamination Removal in microBiomes (SCRuB), a probabilistic in silico decontamination method that incorporates shared information across multiple samples and controls to precisely identify and remove contamination. We validate the accuracy of SCRuB in multiple data-driven simulations and experiments, including induced contamination, and demonstrate that it outperforms state-of-the-art methods by an average of 15-20 times. We showcase the robustness of SCRuB across multiple ecosystems, data types and sequencing depths. Demonstrating its applicability to microbiome research, SCRuB facilitates improved predictions of host phenotypes, most notably the prediction of treatment response in melanoma patients using decontaminated tumor microbiome data.
基于测序的微生物群落分析方法容易受到污染的影响,这可能会掩盖生物信号或产生人为的信号。使用对照进行计算机模拟去污的方法通常被使用,但不能充分利用样本之间共享的信息,也不能处理仅部分来源于污染或生物材料泄漏到对照物中的分类群。在这里,我们提出了用于微生物组污染去除的源追踪(Source tracking for Contamination Removal in microBiomes,SCRuB),这是一种概率性的计算机模拟去污方法,它整合了多个样本和对照物之间的共享信息,以精确识别和去除污染。我们在多个数据驱动的模拟和实验中验证了 SCRuB 的准确性,包括诱导污染,并证明它比最先进的方法平均提高了 15-20 倍。我们展示了 SCRuB 在多个生态系统、数据类型和测序深度下的稳健性。SCRuB 展示了其在微生物组研究中的适用性,有助于改善对宿主表型的预测,尤其是使用去污后的肿瘤微生物组数据预测黑色素瘤患者的治疗反应。