综合多效基因座检测及其在 GWAS 解释中的应用。
Comprehensive Multiple eQTL Detection and Its Application to GWAS Interpretation.
机构信息
School of Biological Sciences and Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, Georgia 30332.
Institute for Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia.
出版信息
Genetics. 2019 Jul;212(3):905-918. doi: 10.1534/genetics.119.302091. Epub 2019 May 22.
Expression QTL (eQTL) detection has emerged as an important tool for unraveling the relationship between genetic risk factors and disease or clinical phenotypes. Most studies are predicated on the assumption that only a single causal variant explains the association signal in each interval. This greatly simplifies the statistical modeling, but is liable to biases in scenarios where multiple local causal-variants are responsible. Here, our primary goal was to address the prevalence of secondary -eQTL signals regulating peripheral blood gene expression locally, utilizing two large human cohort studies, each >2500 samples with accompanying whole genome genotypes. The CAGE (Consortium for the Architecture of Gene Expression) dataset is a compendium of Illumina microarray studies, and the Framingham Heart Study is a two-generation Affymetrix dataset. We also describe Bayesian colocalization analysis of the extent of sharing of -eQTL detected in both studies as well as with the BIOS RNAseq dataset. Stepwise conditional modeling demonstrates that multiple eQTL signals are present for ∼40% of over 3500 eGenes in both microarray datasets, and that the number of loci with additional signals reduces by approximately two-thirds with each conditioning step. Although <20% of the peak signals across platforms fine map to the same credible interval, the colocalization analysis finds that as many as 50-60% of the primary eQTL are actually shared. Subsequently, colocalization of eQTL signals with GWAS hits detected 1349 genes whose expression in peripheral blood is associated with 591 human phenotype traits or diseases, including enrichment for genes with regulatory functions. At least 10%, and possibly as many as 40%, of eQTL-trait colocalized signals are due to nonprimary -eQTL peaks, but just one-quarter of these colocalization signals replicated across the gene expression datasets. Our results are provided as a web-based resource for visualization of multi-site regulation of gene expression and its association with human complex traits and disease states.
表达数量性状基因座 (eQTL) 检测已成为揭示遗传风险因素与疾病或临床表型之间关系的重要工具。大多数研究都基于这样一个假设,即在每个区间中,只有一个单一的因果变异解释关联信号。这极大地简化了统计建模,但在多个局部因果变异负责的情况下,很容易出现偏差。在这里,我们的主要目标是利用两个大型人类队列研究(每个样本超过 2500 个,并有全基因组基因型),解决局部调节外周血基因表达的次要 -eQTL 信号的普遍性问题。CAGE(基因表达结构联盟)数据集是一个 Illumina 微阵列研究的汇编,而弗雷明汉心脏研究是一个两代人 Affymetrix 数据集。我们还描述了贝叶斯共定位分析,以评估两个研究以及 BIOS RNAseq 数据集检测到的 -eQTL 共享程度。逐步条件建模表明,在两个微阵列数据集中,超过 3500 个 eGenes 中约有 40%存在多个 eQTL 信号,并且随着每个条件步骤的进行,具有额外信号的基因座数量减少了约三分之二。尽管跨越平台的峰信号只有不到 20%精确定位到相同的置信区间,但共定位分析发现多达 50-60%的主要 eQTL 实际上是共享的。随后,与 GWAS 命中共定位的 eQTL 信号检测到了 1349 个基因,这些基因在外周血中的表达与 591 个人类表型特征或疾病相关,包括具有调节功能的基因的富集。至少有 10%,甚至可能多达 40%的 eQTL-性状共定位信号归因于非主要 -eQTL 峰,但这些共定位信号中只有四分之一在基因表达数据集中复制。我们的结果作为一个基于网络的资源提供,用于可视化基因表达的多站点调节及其与人类复杂特征和疾病状态的关联。