Center for Computational Mathematics, Flatiron Institute, Simons Foundation, New York, New York, USA.
Department of Statistics, LMU München, Munich, Germany.
Stat Med. 2022 Jul 10;41(15):2786-2803. doi: 10.1002/sim.9384. Epub 2022 Apr 24.
The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host-microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host-associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host-related features and amplicon-derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB-RRR) and negative binomial co-sparse factor regression (NB-FAR). While NB-RRR encodes the underlying dependency among the microbial abundances as outcomes and the host-associated features as predictors through a rank-constrained coefficient matrix, NB-FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit-rank components of the coefficient matrix sequentially, effectively delivering interpretable bi-clusters of taxa and host-associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block-wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.
人类微生物组通过形成复杂的生态宿主-微生物关系,为人体提供重要的生理功能,并帮助维持宿主内环境稳定。虽然已经确定宿主的生活方式、饮食偏好、人口统计学背景和健康状况会影响微生物群落的组成和动态,但特定宿主相关因素与特定微生物类群之间的稳健、可推广的关联仍然难以捉摸。在这里,我们提出了因子回归模型,可以估计宿主相关特征与扩增子衍生微生物类群之间的结构化简约关联。为了考虑扩增子测序计数数据的过度离散性质,我们提出了负二项式降秩回归(NB-RRR)和负二项式共稀疏因子回归(NB-FAR)。虽然 NB-RRR 通过一个秩约束系数矩阵将微生物丰度作为结果和宿主相关特征作为预测因子编码为潜在的依赖关系,但 NB-FAR 使用系数矩阵的稀疏奇异值分解。后者的方法通过依次提取系数矩阵的稀疏单位秩分量来避免联合参数估计的难题,有效地提供了可解释的类群和宿主相关特征的双聚类。为了解决这些因子回归模型相关的非凸优化问题,我们提出了一种新颖的迭代块极大化程序。广泛的模拟研究和对来自美国肠道计划(AGP)的微生物丰度数据的应用表明了所提出的程序的有效性。在 AGP 数据中,我们确定了几个因素,这些因素强烈地将饮食和宿主生活方式与特定的微生物家族联系起来。