Zhang Shuangjie, Shen Yuning, Chen Irene A, Lee Juhee
Department of Statistics, University of California Santa Cruz.
Department of Chemical and Biomolecular Engineering, University of California Los Angeles.
J Am Stat Assoc. 2025;120(550):723-726. doi: 10.1080/01621459.2025.2449721. Epub 2025 Feb 25.
Group factor models have been developed to infer relationships between multiple co-occurring multivariate continuous responses. Motivated by complex count data from multi-domain microbiome studies using next-generation sequencing, we develop a sparse Bayesian group factor model (Sp-BGFM) for multiple count table data that captures the interaction between microorganisms in different domains. Sp-BGFM uses a rounded kernel mixture model using a Dirichlet process (DP) prior with log-normal mixture kernels for count vectors. A group factor model is used to model the covariance matrix of the mixing kernel that describes microorganism interaction. We construct a Dirichlet-Horseshoe (Dir-HS) shrinkage prior and use it as a joint prior for factor loading vectors. Joint sparsity induced by a Dir-HS prior greatly improves the performance in high-dimensional applications. We further model the effects of covariates on microbial abundances using regression. The semiparametric model flexibly accommodates large variability in observed counts and excess zero counts and provides a basis for robust estimation of the interaction and covariate effects. We evaluate Sp-BGFM using simulation studies and real data analysis, comparing it to popular alternatives. Our results highlight the necessity of joint sparsity induced by the Dir-HS prior, and the benefits of a flexible DP model for baseline abundances.
已开发出分组因子模型来推断多个同时出现的多变量连续响应之间的关系。受使用下一代测序的多域微生物组研究中的复杂计数数据的启发,我们针对多个计数表数据开发了一种稀疏贝叶斯分组因子模型(Sp-BGFM),该模型可捕捉不同域中微生物之间的相互作用。Sp-BGFM使用一种舍入核混合模型,该模型使用具有对数正态混合核的狄利克雷过程(DP)先验来处理计数向量。分组因子模型用于对描述微生物相互作用的混合核的协方差矩阵进行建模。我们构建了一个狄利克雷-马蹄形(Dir-HS)收缩先验,并将其用作因子载荷向量的联合先验。由Dir-HS先验引起的联合稀疏性极大地提高了高维应用中的性能。我们进一步使用回归对协变量对微生物丰度的影响进行建模。半参数模型灵活地适应了观测计数中的大变异和过多的零计数,并为稳健估计相互作用和协变量效应提供了基础。我们使用模拟研究和实际数据分析对Sp-BGFM进行评估,并将其与流行的替代方法进行比较。我们的结果突出了Dir-HS先验引起的联合稀疏性的必要性,以及灵活的DP模型对基线丰度的益处。