Wu Chong, Pan Wei
Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America.
Genet Epidemiol. 2018 Apr;42(3):303-316. doi: 10.1002/gepi.22110. Epub 2018 Feb 7.
Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS.
许多基因变异通过基因表达影响复杂性状,转录组全关联研究(TWAS)方法已证明,可利用这一点提高全基因组关联研究(GWAS)的统计效力并增强解读能力。此外,由于多基因遗传,复杂性状通常受基因通路中注释的多个具有相似功能的基因影响。在此,我们将TWAS从基于基因的分析扩展到基于通路的分析:我们整合公共通路集合、表达定量性状位点(eQTL)数据和GWAS汇总关联统计信息(或GWAS个体水平数据),以识别与复杂性状相关的基因通路。基本思路是根据基因对基因表达的估计顺式效应,对通路中基因的单核苷酸多态性(SNP)进行加权,然后通过有效汇总通路中各基因可能微弱的关联信号,自适应地检验该通路与GWAS性状的关联性。P值可通过解析计算得出,因此速度很快。我们将提出的检验方法与KEGG和GO通路应用于两个精神分裂症(SCZ)GWAS汇总关联数据集,分别记为SCZ1和SCZ2,各有约20000名和150000名受试者。分析SCZ1数据所识别出的大多数显著通路都能被SCZ2数据重现。重要的是,我们识别出了与SCZ相关的15条新通路,如GABA受体复合体(GO:1902710),这是基于标准单SNP分析或基于基因的TWAS无法发现的。新识别出的通路可能有助于我们深入了解SCZ的生物学机制。我们的结果展示了将基因表达信息和基因功能注释纳入GWAS基于通路的关联检验的效力。