Division of Biomedical Statistics and Informatics.
Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA.
Bioinformatics. 2018 Feb 15;34(4):643-651. doi: 10.1093/bioinformatics/btx650.
One objective of human microbiome studies is to identify differentially abundant microbes across biological conditions. Previous statistical methods focus on detecting the shift in the abundance and/or prevalence of the microbes and treat the dispersion (spread of the data) as a nuisance. These methods also assume that the dispersion is the same across conditions, an assumption which may not hold in presence of sample heterogeneity. Moreover, the widespread outliers in the microbiome sequencing data make existing parametric models not overly robust. Therefore, a robust and powerful method that allows covariate-dependent dispersion and addresses outliers is still needed for differential abundance analysis.
We introduce a novel test for differential distribution analysis of microbiome sequencing data by jointly testing the abundance, prevalence and dispersion. The test is built on a zero-inflated negative binomial regression model and winsorized count data to account for zero-inflation and outliers. Using simulated data and real microbiome sequencing datasets, we show that our test is robust across various biological conditions and overall more powerful than previous methods.
R package is available at https://github.com/jchen1981/MicrobiomeDDA.
chen.jun2@mayo.edu or zhiwei@njit.edu.
Supplementary data are available at Bioinformatics online.
人类微生物组研究的一个目标是识别生物条件下丰度不同的微生物。先前的统计方法侧重于检测微生物丰度和/或出现率的变化,并将分散(数据的传播)视为一种干扰。这些方法还假设在存在样本异质性的情况下,分散是相同的。此外,微生物组测序数据中广泛存在的异常值使得现有的参数模型不够稳健。因此,仍然需要一种允许协变量相关分散和处理异常值的稳健且强大的方法,用于差异丰度分析。
我们通过联合检验丰度、流行率和分散,引入了一种用于微生物组测序数据差异分布分析的新测试。该测试建立在零膨胀负二项回归模型和已归约计数数据上,以考虑零膨胀和异常值。使用模拟数据和真实的微生物组测序数据集,我们表明我们的测试在各种生物条件下都具有稳健性,并且总体上比以前的方法更强大。
R 包可在 https://github.com/jchen1981/MicrobiomeDDA 上获得。
chen.jun2@mayo.edu 或 zhiwei@njit.edu。
补充数据可在 Bioinformatics 在线获得。