Department of Statistics, Rice University, Houston, TX, USA.
Department of Biostatistics, UT MD Anderson Cancer Center, Houston, TX, USA.
Biostatistics. 2020 Jul 1;21(3):561-576. doi: 10.1093/biostatistics/kxy078.
In this article, we develop a graphical modeling framework for the inference of networks across multiple sample groups and data types. In medical studies, this setting arises whenever a set of subjects, which may be heterogeneous due to differing disease stage or subtype, is profiled across multiple platforms, such as metabolomics, proteomics, or transcriptomics data. Our proposed Bayesian hierarchical model first links the network structures within each platform using a Markov random field prior to relate edge selection across sample groups, and then links the network similarity parameters across platforms. This enables joint estimation in a flexible manner, as we make no assumptions on the directionality of influence across the data types or the extent of network similarity across the sample groups and platforms. In addition, our model formulation allows the number of variables and number of subjects to differ across the data types, and only requires that we have data for the same set of groups. We illustrate the proposed approach through both simulation studies and an application to gene expression levels and metabolite abundances on subjects with varying severity levels of chronic obstructive pulmonary disease. Bayesian inference; Chronic obstructive pulmonary disease (COPD); Data integration; Gaussian graphical model; Markov random field prior; Spike and slab prior.
在本文中,我们开发了一个图形建模框架,用于推断跨多个样本组和数据类型的网络。在医学研究中,当一组由于疾病阶段或亚型的不同而存在异质性的受试者在多个平台(如代谢组学、蛋白质组学或转录组学数据)上进行分析时,就会出现这种情况。我们提出的贝叶斯分层模型首先使用马尔可夫随机场先验在每个平台内链接网络结构,以在样本组之间关联边缘选择,然后链接跨平台的网络相似性参数。这使得我们能够以灵活的方式进行联合估计,因为我们不对数据类型之间的影响方向或样本组和平台之间的网络相似程度做出任何假设。此外,我们的模型公式允许不同数据类型的变量数量和受试者数量不同,并且只需要我们对相同的组集有数据。我们通过模拟研究和对慢性阻塞性肺疾病(COPD)严重程度不同的受试者的基因表达水平和代谢物丰度的应用来说明所提出的方法。贝叶斯推理;慢性阻塞性肺疾病(COPD);数据集成;高斯图形模型;马尔可夫随机场先验;尖峰和板条先验。