Chi Jinling, Ye Jimin, Zhou Ying
School of Mathematics and Statistics, Xidian University, Xi'an, China.
School of Mathematical Sciences, Heilongjiang University, Harbin, China.
Front Microbiol. 2024 May 30;15:1394204. doi: 10.3389/fmicb.2024.1394204. eCollection 2024.
High-throughput sequencing technology facilitates the quantitative analysis of microbial communities, improving the capacity to investigate the associations between the human microbiome and diseases. Our primary motivating application is to explore the association between gut microbes and obesity. The complex characteristics of microbiome data, including high dimensionality, zero inflation, and over-dispersion, pose new statistical challenges for downstream analysis.
We propose a GLM-based zero-inflated generalized Poisson factor analysis (GZIGPFA) model to analyze microbiome data with complex characteristics. The GZIGPFA model is based on a zero-inflated generalized Poisson (ZIGP) distribution for modeling microbiome count data. A link function between the generalized Poisson rate and the probability of excess zeros is established within the generalized linear model (GLM) framework. The latent parameters of the GZIGPFA model constitute a low-rank matrix comprising a low-dimensional score matrix and a loading matrix. An alternating maximum likelihood algorithm is employed to estimate the unknown parameters, and cross-validation is utilized to determine the rank of the model in this study. The proposed GZIGPFA model demonstrates superior performance and advantages through comprehensive simulation studies and real data applications.
高通量测序技术有助于对微生物群落进行定量分析,提高了研究人类微生物组与疾病之间关联的能力。我们的主要应用动机是探索肠道微生物与肥胖之间的关联。微生物组数据的复杂特征,包括高维度、零膨胀和过度离散,给下游分析带来了新的统计挑战。
我们提出了一种基于广义线性模型的零膨胀广义泊松因子分析(GZIGPFA)模型,用于分析具有复杂特征的微生物组数据。GZIGPFA模型基于零膨胀广义泊松(ZIGP)分布对微生物组计数数据进行建模。在广义线性模型(GLM)框架内建立了广义泊松率与多余零概率之间的链接函数。GZIGPFA模型的潜在参数构成一个低秩矩阵,该矩阵由一个低维得分矩阵和一个载荷矩阵组成。采用交替最大似然算法估计未知参数,并利用交叉验证来确定本研究中模型的秩。通过全面的模拟研究和实际数据应用,所提出的GZIGPFA模型展示了卓越的性能和优势。