Osborne Nathan, Peterson Christine B, Vannucci Marina
Department of Statistics, Rice University, Houston, TX.
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX.
J Comput Graph Stat. 2022;31(1):163-175. doi: 10.1080/10618600.2021.1935971. Epub 2021 Jul 19.
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this article, we seek to develop a novel method to simultaneously estimate network interactions and associations to relevant covariates for count data, and specifically for compositional data, which have a fixed sum constraint. We use a hierarchical Bayesian model with latent layers and employ spike-and-slab priors for both edge and covariate selection. For posterior inference, we develop a novel variational inference scheme with an expectation-maximization step, to enable efficient estimation. Through simulation studies, we demonstrate that the proposed model outperforms existing methods in its accuracy of network recovery. We show the practical utility of our model via an application to microbiome data. The human microbiome has been shown to contribute too many of the functions of the human body, and also to be linked with a number of diseases. In our application, we seek to better understand the interaction between microbes and relevant covariates, as well as the interaction of microbes with each other. We call our algorithm simultaneous inference for networks and covariates and provide a Python implementation, which is available online.
网络估计和变量选择在统计文献中已得到广泛研究,但直到最近才同时解决这两个挑战。在本文中,我们试图开发一种新颖的方法,用于同时估计计数数据(特别是具有固定总和约束的成分数据)的网络交互以及与相关协变量的关联。我们使用具有潜在层的分层贝叶斯模型,并对边和协变量选择采用尖峰和平板先验。对于后验推断,我们开发了一种带有期望最大化步骤的新颖变分推断方案,以实现高效估计。通过模拟研究,我们证明所提出的模型在网络恢复准确性方面优于现有方法。我们通过将模型应用于微生物组数据来展示其实际效用。已证明人类微生物组对人体的许多功能有贡献,并且还与多种疾病相关。在我们的应用中,我们试图更好地理解微生物与相关协变量之间的相互作用,以及微生物彼此之间的相互作用。我们将我们的算法称为网络和协变量的同时推断,并提供了一个可在线获取的Python实现。