Suppr超能文献

一种基于广义线性模型的零膨胀广义泊松因子模型,用于分析微生物组数据。

A GLM-based zero-inflated generalized Poisson factor model for analyzing microbiome data.

作者信息

Chi Jinling, Ye Jimin, Zhou Ying

机构信息

School of Mathematics and Statistics, Xidian University, Xi'an, China.

School of Mathematical Sciences, Heilongjiang University, Harbin, China.

出版信息

Front Microbiol. 2024 May 30;15:1394204. doi: 10.3389/fmicb.2024.1394204. eCollection 2024.

Abstract

MOTIVATION

High-throughput sequencing technology facilitates the quantitative analysis of microbial communities, improving the capacity to investigate the associations between the human microbiome and diseases. Our primary motivating application is to explore the association between gut microbes and obesity. The complex characteristics of microbiome data, including high dimensionality, zero inflation, and over-dispersion, pose new statistical challenges for downstream analysis.

RESULTS

We propose a GLM-based zero-inflated generalized Poisson factor analysis (GZIGPFA) model to analyze microbiome data with complex characteristics. The GZIGPFA model is based on a zero-inflated generalized Poisson (ZIGP) distribution for modeling microbiome count data. A link function between the generalized Poisson rate and the probability of excess zeros is established within the generalized linear model (GLM) framework. The latent parameters of the GZIGPFA model constitute a low-rank matrix comprising a low-dimensional score matrix and a loading matrix. An alternating maximum likelihood algorithm is employed to estimate the unknown parameters, and cross-validation is utilized to determine the rank of the model in this study. The proposed GZIGPFA model demonstrates superior performance and advantages through comprehensive simulation studies and real data applications.

摘要

动机

高通量测序技术有助于对微生物群落进行定量分析,提高了研究人类微生物组与疾病之间关联的能力。我们的主要应用动机是探索肠道微生物与肥胖之间的关联。微生物组数据的复杂特征,包括高维度、零膨胀和过度离散,给下游分析带来了新的统计挑战。

结果

我们提出了一种基于广义线性模型的零膨胀广义泊松因子分析(GZIGPFA)模型,用于分析具有复杂特征的微生物组数据。GZIGPFA模型基于零膨胀广义泊松(ZIGP)分布对微生物组计数数据进行建模。在广义线性模型(GLM)框架内建立了广义泊松率与多余零概率之间的链接函数。GZIGPFA模型的潜在参数构成一个低秩矩阵,该矩阵由一个低维得分矩阵和一个载荷矩阵组成。采用交替最大似然算法估计未知参数,并利用交叉验证来确定本研究中模型的秩。通过全面的模拟研究和实际数据应用,所提出的GZIGPFA模型展示了卓越的性能和优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3ebe/11173601/8ef83da9ba4f/fmicb-15-1394204-g0006.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验