Niu Yabo, Ni Yang, Pati Debdeep, Mallick Bani K
Department of Mathematics, University of Houston.
Department of Statistics, Texas A&M University.
J Am Stat Assoc. 2024;119(547):1985-1999. doi: 10.1080/01621459.2023.2233744. Epub 2023 Sep 6.
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates. Building on product partition models, we propose a novel covariate-dependent Gaussian graphical model that allows graphs to vary with covariates so that observations whose covariates are similar share a similar undirected graph. To efficiently embed Gaussian graphical models into our proposed framework, we explore both Gaussian likelihood and pseudo-likelihood functions. For Gaussian likelihood, a G-Wishart distribution is used as a natural conjugate prior, and for the pseudo-likelihood, a product of Gaussianconditionals is used. Moreover, the proposed model has large prior support and is flexible to approximate any -Hölder conditional variance-covariance matrices with . We further show that based on the theory of fractional likelihood, the rate of posterior contraction is minimax optimal assuming the true density to be a Gaussian mixture with a known number of components. The efficacy of the approach is demonstrated via simulation studies and an analysis of a protein network for a breast cancer dataset assisted by mRNA gene expression as covariates.
在传统的高斯图形模型中,通常假定数据具有同质性,不存在影响条件独立性的额外变量。在现代基因组数据集中,存在大量辅助信息,而这些信息在确定联合依赖结构时往往未得到充分利用。在本文中,我们考虑一种贝叶斯方法,在协变量的额外辅助下,对异构多变量观测值背后的无向图进行建模。基于乘积划分模型,我们提出了一种新颖的依赖协变量的高斯图形模型,该模型允许图随协变量变化,使得协变量相似的观测值共享一个相似的无向图。为了将高斯图形模型有效地嵌入到我们提出的框架中,我们探索了高斯似然函数和伪似然函数。对于高斯似然,使用G-Wishart分布作为自然共轭先验,对于伪似然,使用高斯条件的乘积。此外,所提出的模型具有较大的先验支持,并且能够灵活地近似任何具有的-Hölder条件方差-协方差矩阵。我们进一步表明,基于分数似然理论,假设真实密度为具有已知成分数量的高斯混合模型,则后验收缩率是极小极大最优的。通过模拟研究以及对以mRNA基因表达作为协变量辅助的乳腺癌数据集的蛋白质网络分析,证明了该方法的有效性。