Department of Statistics, University of California, Los Angeles, 90095-1554, CA, USA.
Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Piscataway, 08854, NJ, USA.
Genome Biol. 2021 Jun 28;22(1):192. doi: 10.1186/s13059-021-02400-4.
A critical challenge in microbiome data analysis is the existence of many non-biological zeros, which distort taxon abundance distributions, complicate data analysis, and jeopardize the reliability of scientific discoveries. To address this issue, we propose the first imputation method for microbiome data-mbImpute-to identify and recover likely non-biological zeros by borrowing information jointly from similar samples, similar taxa, and optional metadata including sample covariates and taxon phylogeny. We demonstrate that mbImpute improves the power of identifying disease-related taxa from microbiome data of type 2 diabetes and colorectal cancer, and mbImpute preserves non-zero distributions of taxa abundances.
微生物组数据分析中的一个关键挑战是存在许多非生物学零值,这些零值会扭曲分类群丰度分布,使数据分析复杂化,并危及科学发现的可靠性。为了解决这个问题,我们提出了第一个微生物组数据插补方法-mbImpute-通过从相似样本、相似分类群以及可选的元数据(包括样本协变量和分类群系统发育)中共同借用信息来识别和恢复可能的非生物学零值。我们证明,mbImpute 提高了从 2 型糖尿病和结直肠癌的微生物组数据中识别与疾病相关的分类群的能力,并且 mbImpute 保留了分类群丰度的非零分布。