Cremaschi Andrea, Argiento Raffaele, Shoemaker Katherine, Peterson Christine, Vannucci Marina
Department of Cancer Immunology, Institute of Cancer Research, Oslo University Hospital, Oslo, Norway.
Oslo Centre for Biostatistics and Epidemiology (OCBE), University of Oslo, Oslo, Norway.
Bayesian Anal. 2019 Dec;14(4):1271-1301. doi: 10.1214/19-ba1153. Epub 2019 Mar 28.
Gaussian graphical models are useful tools for exploring network structures in multivariate normal data. In this paper we are interested in situations where data show departures from Gaussianity, therefore requiring alternative modeling distributions. The multivariate -distribution, obtained by dividing each component of the data vector by a gamma random variable, is a straightforward generalization to accommodate deviations from normality such as heavy tails. Since different groups of variables may be contaminated to a different extent, Finegold and Drton (2014) introduced the Dirichlet -distribution, where the divisors are clustered using a Dirichlet process. In this work, we consider a more general class of nonparametric distributions as the prior on the divisor terms, namely the class of normalized completely random measures (NormCRMs). To improve the effectiveness of the clustering, we propose modeling the dependence among the divisors through a nonparametric hierarchical structure, which allows for the sharing of parameters across the samples in the data set. This desirable feature enables us to cluster together different components of multivariate data in a parsimonious way. We demonstrate through simulations that this approach provides accurate graphical model inference, and apply it to a case study examining the dependence structure in radiomics data derived from The Cancer Imaging Atlas.
高斯图形模型是探索多元正态数据网络结构的有用工具。在本文中,我们关注数据偏离高斯性的情况,因此需要替代的建模分布。通过将数据向量的每个分量除以一个伽马随机变量得到的多元分布,是一种直接的推广,以适应偏离正态性的情况,如重尾。由于不同组的变量可能受到不同程度的污染,Finegold和Drton(2014)引入了狄利克雷分布,其中除数使用狄利克雷过程进行聚类。在这项工作中,我们考虑一类更一般的非参数分布作为除数项的先验,即归一化完全随机测度(NormCRMs)类。为了提高聚类的有效性,我们建议通过非参数层次结构对除数之间的依赖性进行建模,这允许在数据集中的样本之间共享参数。这个理想的特性使我们能够以简洁的方式将多元数据的不同分量聚类在一起。我们通过模拟证明了这种方法提供了准确的图形模型推断,并将其应用于一个案例研究,该研究考察了来自癌症成像图谱的放射组学数据中的依赖性结构。