Zhao Jing, Akinsanmi Idowu, Arafat Dalia, Cradick T J, Lee Ciaran M, Banskota Samridhi, Marigorta Urko M, Bao Gang, Gibson Greg
School of Biology and Center for Integrative Genomics, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Wallace H. Coulter Department of Biomedical Engineering, Laboratory of Biomolecular Engineering and Nanomedicine, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Am J Hum Genet. 2016 Feb 4;98(2):299-309. doi: 10.1016/j.ajhg.2015.12.023.
In order to evaluate whether rare regulatory variants in the vicinity of promoters are likely to impact gene expression, we conducted a novel burden test for enrichment of rare variants at the extremes of expression. After sequencing 2-kb promoter regions of 472 genes in 410 healthy adults, we performed a quadratic regression of rare variant count on bins of peripheral blood transcript abundance from microarrays, summing over ranks of all genes. After adjusting for common eQTLs and the major axes of gene expression covariance, a highly significant excess of variants with minor allele frequency less than 0.05 at both high and low extremes across individuals was observed. Further enrichment was seen in sites annotated as potentially regulatory by RegulomeDB, but a deficit of effects was associated with known metabolic disease genes. The main result replicates in an independent sample of 75 individuals with RNA-seq and whole-genome sequence information. Three of four predicted large-effect sites were validated by CRISPR/Cas9 knockdown in K562 cells, but simulations indicate that effect sizes need not be unusually large to produce the observed burden. Unusually divergent low-frequency promoter haplotypes were observed at 31 loci, at least 9 of which appear to be derived from Neandertal admixture, but these were not associated with divergent gene expression in blood. The overall burden test results are consistent with rare and private regulatory variants driving high or low transcription at specific loci, potentially contributing to disease.
为了评估启动子附近的罕见调控变异是否可能影响基因表达,我们针对极端表达情况下罕见变异的富集情况进行了一项新型负荷测试。在对410名健康成年人中472个基因的2 kb启动子区域进行测序后,我们对来自微阵列的外周血转录本丰度区间的罕见变异计数进行了二次回归,并对所有基因的秩进行求和。在对常见的表达数量性状基因座(eQTL)和基因表达协方差的主轴进行校正后,我们观察到在个体的高表达和低表达极端情况下,次要等位基因频率小于0.05的变异出现了高度显著的过量。在RegulomeDB注释为潜在调控位点的区域中观察到了进一步的富集,但已知代谢疾病基因的效应存在不足。主要结果在一个包含75名个体的独立样本中得到了重复验证,该样本具有RNA测序和全基因组序列信息。四个预测的大效应位点中有三个通过在K562细胞中进行CRISPR/Cas9敲低得到了验证,但模拟结果表明,产生观察到的负荷效应的大小不一定异常大。在31个位点观察到了异常分歧的低频启动子单倍型,其中至少9个似乎源自尼安德特人混合基因,但这些与血液中基因表达的分歧无关。总体负荷测试结果与罕见和私人调控变异在特定基因座驱动高转录或低转录一致,这可能导致疾病。