Zhang Han, Wheeler William, Hyland Paula L, Yang Yifan, Shi Jianxin, Chatterjee Nilanjan, Yu Kai
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America.
Information Management Services Inc., Calverton, Maryland, United States of America.
PLoS Genet. 2016 Jun 30;12(6):e1006122. doi: 10.1371/journal.pgen.1006122. eCollection 2016 Jun.
Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.
对多个全基因组关联研究(GWAS)进行荟萃分析已成为检测单核苷酸多态性(SNP)与复杂性状关联的有效方法。然而,将荟萃分析中易于获取的SNP水平汇总统计数据整合到更强大的多标记测试程序中却很困难,因为这些程序通常需要个体水平的遗传数据。我们开发了一种名为基于汇总的自适应秩截断乘积(sARTP)的通用程序,用于进行基因和通路荟萃分析,该程序仅使用SNP水平的汇总统计数据,并结合从一组个体水平遗传数据估计的基因型相关性。我们通过实证数据和模拟数据证明了sARTP的有效性和功效优势。我们使用sARTP对2型糖尿病(T2D)进行了基于通路的全面荟萃分析,整合了来自两项大型研究的SNP水平汇总统计数据,这两项研究包括19809例T2D病例和111181例欧洲血统对照。在4713条候选通路中,排除了170个GWAS确定的T2D位点附近区域的基因,我们检测到43条T2D全局显著通路(Bonferroni校正p值<0.05),其中包括KEGG定义的胰岛素信号通路和T2D通路,以及根据胰腺腺癌、肝细胞癌和膀胱癌的特定基因表达模式定义的通路。使用来自8项东亚T2D GWAS的汇总数据,包括6952例病例和11865例对照,我们发现欧洲人群中确定的43条通路中有7条在东亚人群中以0.1的错误发现率仍然显著。我们为sARTP创建了一个R包和一个基于网络的工具,能够分析包含数千个基因和数万个SNP的通路。