Hu Tao, Gallins Paul, Zhou Yi-Hui
Bioinformatics Research Center, North Carolina State University, NC, 27695.
Department of Biological Sciences and Bioinformatics Research Center, North Carolina State University, NC, 27695.
Stat (Int Stat Inst). 2018;7(1). doi: 10.1002/sta4.185. Epub 2018 Jun 19.
The microbiome is increasingly recognized as an important aspect of the health of host species, involved in many biological pathways and processes and potentially useful as health biomarkers. Taking advantage of high-throughput sequencing technologies, modern bacterial microbiome studies are metagenomic, interrogating thousands of taxa simultaneously. Several data analysis frameworks have been proposed for microbiome sequence read count data and determining the most significant features. However, there is still room for improvement. We introduce a zero-inflated beta-binomial (ZIBB) to model the distribution of microbiome count data and to determine association with a continuous or categorical phenotype of interest. The approach can exploit mean-variance relationships to improve power and adjust for covariates. The proposed method is a mixture model with two components: (i) a zero model accounting for excess zeros and (ii) a count model to capture the remaining component by beta-binomial regression, allowing for overdispersion effects. Simulation studies show that our proposed method effectively controls type I error and has higher power than competing methods to detect taxa associated with phenotype. An R package ZIBBSeqDiscovery is available on R CRAN.
微生物组越来越被认为是宿主物种健康的一个重要方面,它参与许多生物途径和过程,并且有可能作为健康生物标志物。利用高通量测序技术,现代细菌微生物组研究是宏基因组学的,可同时对数千个分类单元进行分析。已经提出了几种用于微生物组序列读取计数数据和确定最显著特征的数据分析框架。然而,仍有改进的空间。我们引入零膨胀β-二项式(ZIBB)来对微生物组计数数据的分布进行建模,并确定与感兴趣的连续或分类表型的关联。该方法可以利用均值-方差关系来提高功效并对协变量进行调整。所提出的方法是一个具有两个成分的混合模型:(i)一个用于解释过多零值的零模型,以及(ii)一个通过β-二项式回归来捕获其余成分的计数模型,允许存在过度离散效应。模拟研究表明,我们提出的方法有效地控制了I型错误,并且在检测与表型相关的分类单元方面比竞争方法具有更高的功效。R包ZIBBSeqDiscovery可在R CRAN上获取。