快速零膨胀负二项式混合建模方法分析纵向宏基因组数据。

Fast zero-inflated negative binomial mixed modeling approach for analyzing longitudinal metagenomics data.

机构信息

Department of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, GA 30458, USA.

Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.

出版信息

Bioinformatics. 2020 Apr 15;36(8):2345-2351. doi: 10.1093/bioinformatics/btz973.

DOI:10.1093/bioinformatics/btz973

PMID:31904815

Abstract

MOTIVATION

Longitudinal metagenomics data, including both 16S rRNA and whole-metagenome shotgun sequencing data, enhanced our abilities to understand the dynamic associations between the human microbiome and various diseases. However, analytic tools have not been fully developed to simultaneously address the main challenges of longitudinal metagenomics data, i.e. high-dimensionality, dependence among samples and zero-inflation of observed counts.

RESULTS

We propose a fast zero-inflated negative binomial mixed modeling (FZINBMM) approach to analyze high-dimensional longitudinal metagenomic count data. The FZINBMM approach is based on zero-inflated negative binomial mixed models (ZINBMMs) for modeling longitudinal metagenomic count data and a fast EM-IWLS algorithm for fitting ZINBMMs. FZINBMM takes advantage of a commonly used procedure for fitting linear mixed models, which allows us to include various types of fixed and random effects and within-subject correlation structures and quickly analyze many taxa. We found that FZINBMM remarkably outperformed in computational efficiency and was statistically comparable with two R packages, GLMMadaptive and glmmTMB, that use numerical integration to fit ZINBMMs. Extensive simulations and real data applications showed that FZINBMM outperformed other previous methods, including linear mixed models, negative binomial mixed models and zero-inflated Gaussian mixed models.

AVAILABILITY AND IMPLEMENTATION

FZINBMM has been implemented in the R package NBZIMM, available in the public GitHub repository http://github.com//nyiuab//NBZIMM.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

纵向宏基因组学数据，包括 16S rRNA 和全宏基因组鸟枪法测序数据，增强了我们理解人类微生物组与各种疾病之间动态关联的能力。然而，分析工具尚未完全开发出来，以同时解决纵向宏基因组学数据的主要挑战，即高维性、样品之间的依赖性和观测计数的零膨胀。

结果

我们提出了一种快速零膨胀负二项混合建模（FZINBMM）方法来分析高维纵向宏基因组计数数据。FZINBMM 方法基于零膨胀负二项混合模型（ZINBMM）来建模纵向宏基因组计数数据，以及快速 EM-IWLS 算法来拟合 ZINBMM。FZINBMM 利用了一种常用于拟合线性混合模型的常用程序，这使得我们能够包含各种类型的固定和随机效应以及个体内相关性结构，并快速分析许多分类群。我们发现 FZINBMM 在计算效率方面表现出色，在统计学上与使用数值积分拟合 ZINBMM 的两个 R 包 GLMMadaptive 和 glmmTMB 相当。广泛的模拟和真实数据应用表明，FZINBMM 优于其他先前的方法，包括线性混合模型、负二项混合模型和零膨胀高斯混合模型。