具有协变量的高维高斯图形回归模型

High-Dimensional Gaussian Graphical Regression Models with Covariates.

作者信息

Zhang Jingfei, Li Yi

机构信息

Department of Management Science, University of Miami, Coral Gables, FL 33146.

Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109.

出版信息

J Am Stat Assoc. 2023;118(543):2088-2100. doi: 10.1080/01621459.2022.2034632. Epub 2022 Mar 14.

DOI:10.1080/01621459.2022.2034632

PMID:38143787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10746132/

Abstract

Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network structures, and recover both the population-level and subject-level gene networks. Our framework encourages sparsity of covariate effects on both the mean and the precision matrix. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and establish in both cases the convergence rates and the selection consistency of the estimated precision parameters. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.

摘要

尽管高斯图形模型已在许多科学领域中广泛使用，但在将图形结构与外部协变量联系起来方面取得的进展相对有限。我们提出了一种高斯图形回归模型，该模型基于协变量对高斯图形模型的均值和精度矩阵进行回归。在共表达数量性状基因座（QTL）研究的背景下，我们的方法可以确定遗传变异和临床状况如何调节个体水平的网络结构，并恢复群体水平和个体水平的基因网络。我们的框架鼓励协变量对均值和精度矩阵的影响具有稀疏性。特别是对于精度矩阵，我们分别规定了有效协变量及其对网络边的影响在同时稀疏性，即组稀疏性和逐元素稀疏性。我们首先在均值参数已知的情况下建立变量选择一致性，然后在更具挑战性的均值未知且依赖于外部协变量的情况下建立变量选择一致性，并在这两种情况下建立估计精度参数的收敛速度和选择一致性。通过模拟研究以及在脑癌患者共表达QTL研究中的应用，证明了我们提出的方法的实用性和有效性。