Zhao Sihai Dave, Cai T Tony, Cappola Thomas P, Margulies Kenneth B, Li Hongzhe
Department of Statistics, University of Illinois at Urbana-Champaign.
Department of Statistics, The Wharton School, University of Pennsylvania.
J Am Stat Assoc. 2017;112(519):1032-1046. doi: 10.1080/01621459.2016.1270825. Epub 2017 Jan 5.
Genome-wide association studies (GWAS) and differential expression analyses have had limited success in finding genes that cause complex diseases such as heart failure (HF), a leading cause of death in the United States. This paper proposes a new statistical approach that integrates GWAS and expression quantitative trait loci (eQTL) data to identify important HF genes. For such genes, genetic variations that perturb its expression are also likely to influence disease risk. The proposed method thus tests for the presence of simultaneous signals: SNPs that are associated with the gene's expression as well as with disease. An analytic expression for the -value is obtained, and the method is shown to be asymptotically adaptively optimal under certain conditions. It also allows the GWAS and eQTL data to be collected from different groups of subjects, enabling investigators to integrate public resources with their own data. Simulation experiments show that it can be more powerful than standard approaches and also robust to linkage disequilibrium between variants. The method is applied to an extensive analysis of HF genomics and identifies several genes with biological evidence for being functionally relevant in the etiology of HF. It is implemented in the R package ssa.
全基因组关联研究(GWAS)和差异表达分析在寻找导致诸如心力衰竭(HF)等复杂疾病的基因方面成效有限,心力衰竭是美国主要的死亡原因之一。本文提出了一种新的统计方法,该方法整合了GWAS和表达定量性状位点(eQTL)数据以识别重要的HF基因。对于此类基因,干扰其表达的遗传变异也可能影响疾病风险。因此,所提出的方法用于检验同时信号的存在:与基因表达以及疾病相关的单核苷酸多态性(SNP)。获得了p值的解析表达式,并且该方法在某些条件下被证明是渐近自适应最优的。它还允许从不同的受试者群体中收集GWAS和eQTL数据,使研究人员能够将公共资源与他们自己的数据整合起来。模拟实验表明,该方法比标准方法更具效力,并且对变异之间的连锁不平衡也具有鲁棒性。该方法应用于HF基因组学的广泛分析,并识别出几个在HF病因学中具有功能相关性生物学证据的基因。它在R包ssa中实现。