Song Zeyuan, Gunn Sophia, Monti Stefano, Peloso Gina Marie, Liu Ching-Ti, Lunetta Kathryn, Sebastiani Paola
Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA.
Tufts University School of Medicine, Boston, MA, USA.
Front Syst Biol. 2025;5. doi: 10.3389/fsysb.2025.1589079. Epub 2025 Jul 3.
Gaussian Graphical Models (GGMs) are a type of network modeling that uses partial correlation rather than correlation for representing complex relationships among multiple variables. The advantage of using partial correlation is to show the relation between two variables after "adjusting" for the effects of other variables and leads to more parsimonious and interpretable models. There are well established procedures to build GGMs from a sample of independent and identical distributed observations. However, many studies include clustered and longitudinal data that result in correlated observations and ignoring this correlation among observations can lead to inflated Type I error. In this paper, we propose a cluster-based bootstrap algorithm to infer GGMs from correlated data. We use extensive simulations of correlated data from family-based studies to show that the proposed bootstrap method does not inflate the Type I error while retaining statistical power compared to alternative solutions when there are sufficient number of clusters. We apply our method to learn the GGM that represents complex relations between 47 Polygenic Risk Scores generated using genome-wide genotype data from the Long Life Family Study. By comparing it to the conventional methods that ignore within-cluster correlation, we show that our method controls the Type I error well without power loss.
高斯图模型(GGMs)是一种网络建模类型,它使用偏相关而非相关性来表示多个变量之间的复杂关系。使用偏相关的优势在于,在“调整”其他变量的影响后展示两个变量之间的关系,并产生更简洁且易于解释的模型。从独立同分布观测样本构建高斯图模型有成熟的程序。然而,许多研究包含聚类和纵向数据,这会导致观测值相关,而忽略观测值之间的这种相关性可能会导致第一类错误膨胀。在本文中,我们提出一种基于聚类的自助算法,用于从相关数据中推断高斯图模型。我们对基于家庭研究的相关数据进行了大量模拟,结果表明,当有足够数量的聚类时,与其他替代方法相比,所提出的自助方法在保持统计功效的同时不会使第一类错误膨胀。我们应用我们的方法来学习表示使用长寿家庭研究的全基因组基因型数据生成的47个多基因风险评分之间复杂关系的高斯图模型。通过将其与忽略聚类内相关性的传统方法进行比较,我们表明我们的方法能很好地控制第一类错误且不会损失功效。