Austin George I, Brown Kav Aya, ElNaggar Shahd, Park Heekuk, Biermann Jana, Uhlemann Anne-Catrin, Pe'er Itsik, Korem Tal
Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Nat Microbiol. 2025 Apr;10(4):897-911. doi: 10.1038/s41564-025-01954-4. Epub 2025 Mar 27.
Every step in common microbiome profiling protocols has variable efficiency for each microbe, for example, different DNA extraction efficiency for Gram-positive bacteria. These processing biases impede the identification of signals that are biologically interpretable and generalizable across studies. 'Batch-correction' methods have been used to address these issues computationally with some success, but they are largely non-interpretable and often require the use of an outcome variable in a manner that risks overfitting. We present DEBIAS-M (domain adaptation with phenotype estimation and batch integration across studies of the microbiome), an interpretable framework for inference and correction of processing bias, which facilitates domain adaptation in microbiome studies. DEBIAS-M learns bias-correction factors for each microbe in each batch that simultaneously minimize batch effects and maximize cross-study associations with phenotypes. Using diverse benchmarks including 16S rRNA and metagenomic sequencing, classification and regression, and a variety of clinical and molecular targets, we demonstrate that using DEBIAS-M improves cross-study prediction accuracy compared with commonly used batch-correction methods. Notably, we show that the inferred bias-correction factors are stable, interpretable and strongly associated with specific experimental protocols. Overall, we show that DEBIAS-M facilitates improved modelling of microbiome data and identification of interpretable signals that generalize across studies.
常见微生物组分析方案中的每一步对每种微生物的效率都有所不同,例如,革兰氏阳性菌的DNA提取效率就不同。这些处理偏差阻碍了对具有生物学可解释性且能在不同研究中通用的信号的识别。“批次校正”方法已被用于通过计算解决这些问题并取得了一些成功,但它们在很大程度上难以解释,并且通常需要以存在过度拟合风险的方式使用结果变量。我们提出了DEBIAS-M(微生物组跨研究的表型估计和批次整合的域适应),这是一个用于推断和校正处理偏差的可解释框架,它有助于微生物组研究中的域适应。DEBIAS-M为每个批次中的每种微生物学习偏差校正因子,同时最小化批次效应并最大化与表型的跨研究关联。使用包括16S rRNA和宏基因组测序、分类和回归以及各种临床和分子靶点在内的多种基准,我们证明与常用的批次校正方法相比,使用DEBIAS-M可提高跨研究预测准确性。值得注意的是,我们表明推断出的偏差校正因子是稳定的、可解释的,并且与特定的实验方案密切相关。总体而言,我们表明DEBIAS-M有助于改进微生物组数据建模,并识别在不同研究中通用的可解释信号。