Department of Marketing Information Consulting, Mokwon University, Daejeon, KOREA.
Department of Preventive Medicine, School of Medicine, Kyungpook National University, Daegu, KOREA.
PLoS One. 2019 Sep 12;14(9):e0217189. doi: 10.1371/journal.pone.0217189. eCollection 2019.
Genome-wide association studies (GWAS) have been successful in identifying genetic variants associated with complex diseases. However, association analyses between genotypes and phenotypes are not straightforward due to the complex relationships between genetic and environmental factors. Moreover, multiple correlated phenotypes further complicate such analyses. To resolve this complexity, we present an analysis using structural equation modeling (SEM). Unlike current methods that focus only on identifying direct associations between diseases and genetic variants such as single-nucleotide polymorphisms (SNPs), our method introduces the effects of intermediate phenotypes, which are related phenotypes distinct from the target, into the systematic genetic study of diseases. Moreover, we consider multiple diseases simultaneously in a single model. The procedure can be summarized in four steps: 1) selection of informative SNPs, 2) extraction of latent variables from the selected SNPs, 3) investigation of the relationships among intermediate phenotypes and diseases, and 4) construction of an SEM. As a result, a quantitative map can be drawn that simultaneously shows the relationship among multiple SNPs, phenotypes, and diseases. In this study, we considered two correlated diseases, hypertension and type 2 diabetes (T2D), which are known to have a substantial overlap in their disease mechanism and have significant public health implications. As intermediate phenotypes for these diseases, we considered three obesity-related phenotypes-subscapular skin fold thickness, body mass index, and waist circumference-as traits representing subcutaneous adiposity, overall adiposity, and abdominal adiposity, respectively. Using GWAS data collected from the Korea Association Resource (KARE) project, we applied the proposed SEM process. Among 327,872 SNPs, 24 informative SNPs were selected in the first step (p<1.0E-05). Ten latent variables were generated in step 2. After an exploratory analysis, we established a path diagram among phenotypes and diseases in step 3. Finally, in step 4, we produced a quantitative map with paths moving from specific SNPs to hypertension through intermediate phenotypes and T2D. The resulting model had high goodness-of-fit measures (χ2 = 536.52, NFI = 0.997, CFI = 0.998, GFI = 0.995, AGFI = 0.993, RMSEA = 0.012).
全基因组关联研究(GWAS)已经成功地鉴定出与复杂疾病相关的遗传变异。然而,由于遗传和环境因素之间的复杂关系,基因型和表型之间的关联分析并不简单。此外,多个相关的表型进一步使这些分析变得复杂。为了解决这种复杂性,我们提出了一种使用结构方程模型(SEM)的分析方法。与目前仅关注识别疾病和遗传变异(如单核苷酸多态性[SNP])之间直接关联的方法不同,我们的方法将与目标不同的相关表型中间表型的影响引入到疾病的系统遗传研究中。此外,我们在单个模型中同时考虑多种疾病。该过程可以总结为四个步骤:1)选择信息丰富的 SNPs,2)从所选 SNPs 中提取潜在变量,3)研究中间表型与疾病之间的关系,4)构建 SEM。结果可以绘制出一张同时显示多个 SNPs、表型和疾病之间关系的定量图谱。在这项研究中,我们考虑了两种相关疾病,高血压和 2 型糖尿病(T2D),已知它们在疾病机制上有很大的重叠,并且对公共健康有重大影响。作为这些疾病的中间表型,我们考虑了三种与肥胖相关的表型-肩胛下皮褶厚度、体重指数和腰围-分别代表皮下脂肪、总体脂肪和腹部脂肪。使用从韩国关联资源(KARE)项目中收集的 GWAS 数据,我们应用了提出的 SEM 过程。在 327872 个 SNP 中,在第一步(p<1.0E-05)中选择了 24 个信息丰富的 SNPs。第二步生成了 10 个潜在变量。经过探索性分析,我们在第三步中建立了表型和疾病之间的路径图。最后,在第四步中,我们生成了一个定量图谱,其中路径从特定的 SNP 移动到高血压,通过中间表型和 T2D。最终模型具有较高的拟合优度指标(χ2=536.52,NFI=0.997,CFI=0.998,GFI=0.995,AGFI=0.993,RMSEA=0.012)。