Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.
Center for Precision Medicine, Seoul National University Hospital, Seoul, 03080, Korea.
BMC Med Genomics. 2020 Feb 24;13(Suppl 3):26. doi: 10.1186/s12920-019-0650-0.
Genome-wide association studies (GWAS) have been widely used to identify phenotype-related genetic variants using many statistical methods, such as logistic and linear regression. However, GWAS-identified SNPs, as identified with stringent statistical significance, explain just a small portion of the overall estimated genetic heritability. To address this 'missing heritability' issue, gene- and pathway-based analysis, and biological mechanisms, have been used for many GWAS studies. However, many of these methods often neglect the correlation between genes and between pathways.
We constructed a hierarchical component model that considers correlations both between genes and between pathways. Based on this model, we propose a novel pathway analysis method for GWAS datasets, Hierarchical structural Component Model for Pathway analysis of Common vAriants (HisCoM-PCA). HisCoM-PCA first summarizes the common variants of each gene, first at the gene-level, and then analyzes all pathways simultaneously by ridge-type penalization of both the gene and pathway effects on the phenotype. Statistical significance of the gene and pathway coefficients can be examined by permutation tests.
Using the simulation data set of Genetic Analysis Workshop 17 (GAW17), for both binary and continuous phenotypes, we showed that HisCoM-PCA well-controlled type I error, and had a higher empirical power compared to several other methods. In addition, we applied our method to a SNP chip dataset of KARE for four human physiologic traits: (1) type 2 diabetes; (2) hypertension; (3) systolic blood pressure; and (4) diastolic blood pressure. Those results showed that HisCoM-PCA could successfully identify signal pathways with superior statistical and biological significance.
Our approach has the advantage of providing an intuitive biological interpretation for associations between common variants and phenotypes, via pathway information, potentially addressing the missing heritability conundrum.
全基因组关联研究(GWAS)已被广泛用于使用多种统计方法(如逻辑和线性回归)识别与表型相关的遗传变异。然而,GWAS 鉴定的 SNP 仅能解释总体遗传力的一小部分,而这些 SNP 是通过严格的统计学意义鉴定的。为了解决这个“遗传力缺失”问题,已经使用了基于基因和途径的分析以及生物学机制进行了许多 GWAS 研究。然而,许多这些方法通常忽略了基因之间和途径之间的相关性。
我们构建了一个层次组件模型,该模型考虑了基因和途径之间的相关性。基于该模型,我们提出了一种新的 GWAS 数据集的途径分析方法,即常见变体的分层结构成分模型分析(HisCoM-PCA)。HisCoM-PCA 首先汇总每个基因的常见变体,首先在基因水平上,然后通过对基因和途径对表型的影响进行脊型惩罚,同时分析所有途径。基因和途径系数的统计显著性可以通过置换检验来检验。
使用遗传分析研讨会 17(GAW17)的模拟数据集,对于二分类和连续表型,我们表明 HisCoM-PCA 很好地控制了 I 型错误,并且与其他几种方法相比具有更高的经验功效。此外,我们将我们的方法应用于 KARE 的 SNP 芯片数据集,用于研究四个人类生理特征:(1)2 型糖尿病;(2)高血压;(3)收缩压;和(4)舒张压。这些结果表明,HisCoM-PCA 可以成功地识别具有优越统计和生物学意义的信号途径。
我们的方法具有通过途径信息为常见变体与表型之间的关联提供直观的生物学解释的优势,从而可能解决遗传力缺失的难题。