Nuclear Dynamics Programme, Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK.
Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK.
Nucleic Acids Res. 2020 Apr 6;48(6):2866-2879. doi: 10.1093/nar/gkaa123.
Identifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritizing such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal >1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localize to the promoter regions of other genes, supporting the notion of 'epromoters': dual-action CRMs with promoter and distal enhancer activity.
鉴定控制特定基因表达的 DNA 顺式调控模块(CRMs)对于破译转录调控的逻辑至关重要。自然遗传变异可以通过其与基因表达的等位基因关联,指出特定序列可能的基因调控功能。然而,在没有纳入先验知识的情况下,通过纯粹的关联测试全面识别因果调节序列具有挑战性,因为统计能力有限且存在连锁不平衡的影响。影响 CRM 中转录因子(TF)结合的序列变异具有强烈影响基因调控功能的潜力,这为在关联测试中优先考虑此类变异提供了动力。在这里,我们生成了一个 CRM 图谱,显示了人类淋巴母细胞系中 TF 结合亲和力的预测等位基因变异,并测试了它们与从启动子捕获 Hi-C 和直接线性邻近推断的潜在靶基因表达的关联。我们揭示了 >1300 个与靶基因表达相关的 CRM TF 结合变体,其中大多数在标准关联测试中未被检测到。显示与它们在 3D 中接触的基因表达相关联的 CRM 的很大一部分定位于其他基因的启动子区域,支持“epromoters”的概念:具有启动子和远端增强子活性的双重作用 CRM。