Dryden Nicola H, Broome Laura R, Dudbridge Frank, Johnson Nichola, Orr Nick, Schoenfelder Stefan, Nagano Takashi, Andrews Simon, Wingett Steven, Kozarewa Iwanka, Assiotis Ioannis, Fenwick Kerry, Maguire Sarah L, Campbell James, Natrajan Rachael, Lambros Maryou, Perrakis Eleni, Ashworth Alan, Fraser Peter, Fletcher Olivia
Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London SW3 6JB, United Kingdom;
Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London WC1E 7HT, United Kingdom;
Genome Res. 2014 Nov;24(11):1854-68. doi: 10.1101/gr.175034.114. Epub 2014 Aug 13.
Genome-wide association studies have identified more than 70 common variants that are associated with breast cancer risk. Most of these variants map to non-protein-coding regions and several map to gene deserts, regions of several hundred kilobases lacking protein-coding genes. We hypothesized that gene deserts harbor long-range regulatory elements that can physically interact with target genes to influence their expression. To test this, we developed Capture Hi-C (CHi-C), which, by incorporating a sequence capture step into a Hi-C protocol, allows high-resolution analysis of targeted regions of the genome. We used CHi-C to investigate long-range interactions at three breast cancer gene deserts mapping to 2q35, 8q24.21, and 9q31.2. We identified interaction peaks between putative regulatory elements ("bait fragments") within the captured regions and "targets" that included both protein-coding genes and long noncoding (lnc) RNAs over distances of 6.6 kb to 2.6 Mb. Target protein-coding genes were IGFBP5, KLF4, NSMCE2, and MYC; and target lncRNAs included DIRC3, PVT1, and CCDC26. For one gene desert, we were able to define two SNPs (rs12613955 and rs4442975) that were highly correlated with the published risk variant and that mapped within the bait end of an interaction peak. In vivo ChIP-qPCR data show that one of these, rs4442975, affects the binding of FOXA1 and implicate this SNP as a putative functional variant.
全基因组关联研究已经鉴定出70多个与乳腺癌风险相关的常见变异。这些变异大多定位于非蛋白质编码区,有几个定位于基因沙漠,即数百千碱基中缺乏蛋白质编码基因的区域。我们推测基因沙漠含有长程调控元件,这些元件可与靶基因发生物理相互作用以影响其表达。为了验证这一点,我们开发了捕获Hi-C(CHi-C)技术,该技术通过将序列捕获步骤纳入Hi-C实验方案,实现对基因组靶向区域的高分辨率分析。我们使用CHi-C研究了定位于2q35、8q24.21和9q31.2的三个乳腺癌基因沙漠中的长程相互作用。我们在捕获区域内的假定调控元件(“诱饵片段”)与“靶标”之间鉴定出相互作用峰,这些靶标包括蛋白质编码基因和长链非编码(lnc)RNA,距离范围为6.6 kb至2.6 Mb。靶蛋白质编码基因有IGFBP5、KLF4、NSMCE2和MYC;靶lncRNA包括DIRC3、PVT1和CCDC26。对于一个基因沙漠,我们能够确定两个单核苷酸多态性(SNP,rs12613955和rs4442975),它们与已发表的风险变异高度相关,并且定位于一个相互作用峰的诱饵末端内。体内染色质免疫沉淀定量PCR(ChIP-qPCR)数据表明,其中一个SNP,rs4442975,影响FOXA1的结合,并表明该SNP是一个假定的功能变异。