Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Nucleic Acids Res. 2018 Sep 28;46(17):8740-8753. doi: 10.1093/nar/gky686.
The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.
全基因组关联研究(GWAS)鉴定的大多数变体位于非编码基因组中,影响转录增强子等调控元件。然而,要描述它们的影响,需要将 GWAS 结果与特定于上下文的调控活性和连锁不平衡注释相结合,以识别非编码关联信号背后的因果变体以及它们影响的调控元件、组织背景和靶基因。我们提出了 INFERNO,这是一种新颖的方法,它整合了数百个功能基因组数据集,涵盖了增强子活性、转录因子结合位点和表达数量性状基因座与 GWAS 汇总统计数据。INFERNO 包括新的统计方法,用于量化组织特异性增强子重叠的经验富集,并识别失调的长非编码 RNA(lncRNA)的共调控网络。我们将 INFERNO 应用于两项大型 GWAS 研究。对于精神分裂症(36989 例病例,113075 例对照),INFERNO 确定了可能影响与精神分裂症相关基因的大脑增强子的因果变体。对于炎症性肠病(IBD)(12882 例病例,21770 例对照),INFERNO 发现了免疫和消化增强子以及参与适应性免疫反应调节的 lncRNA 的富集。总之,INFERNO 全面推断了因果非编码变体的分子机制,为 GWAS 后分析提供了一种敏感的假设生成方法。该软件可用作开源管道和网络服务器。