Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, 70803, USA.
Department of Cell Biology, National Research Centre, Giza, 12622, Egypt.
BMC Genom Data. 2022 Feb 17;23(1):13. doi: 10.1186/s12863-021-01021-x.
Numerous genome-wide association studies (GWAS) conducted to date revealed genetic variants associated with various diseases, including breast and prostate cancers. Despite the availability of these large-scale data, relatively few variants have been functionally characterized, mainly because the majority of single-nucleotide polymorphisms (SNPs) map to the non-coding regions of the human genome. The functional characterization of these non-coding variants and the identification of their target genes remain challenging.
In this communication, we explore the potential functional mechanisms of non-coding SNPs by integrating GWAS with the high-resolution chromosome conformation capture (Hi-C) data for breast and prostate cancers. We show that more genetic variants map to regulatory elements through the 3D genome structure than the 1D linear genome lacking physical chromatin interactions. Importantly, the association of enhancers, transcription factors, and their target genes with breast and prostate cancers tends to be higher when these regulatory elements are mapped to high-risk SNPs through spatial interactions compared to simply using a linear proximity. Finally, we demonstrate that topologically associating domains (TADs) carrying high-risk SNPs also contain gene regulatory elements whose association with cancer is generally higher than those belonging to control TADs containing no high-risk variants.
Our results suggest that many SNPs may contribute to the cancer development by affecting the expression of certain tumor-related genes through long-range chromatin interactions with gene regulatory elements. Integrating large-scale genetic datasets with the 3D genome structure offers an attractive and unique approach to systematically investigate the functional mechanisms of genetic variants in disease risk and progression.
迄今为止,许多全基因组关联研究(GWAS)揭示了与各种疾病相关的遗传变异,包括乳腺癌和前列腺癌。尽管有这些大规模的数据,但只有相对较少的变异得到了功能特征的描述,主要是因为大多数单核苷酸多态性(SNP)都映射到人类基因组的非编码区域。这些非编码变异的功能特征描述和它们的靶基因的鉴定仍然具有挑战性。
在本通讯中,我们通过整合全基因组关联研究和乳腺癌和前列腺癌的高分辨率染色体构象捕获(Hi-C)数据,探索了非编码 SNP 的潜在功能机制。我们表明,通过三维基因组结构,更多的遗传变异映射到调控元件,而不是缺乏物理染色质相互作用的一维线性基因组。重要的是,与简单地使用线性邻近性相比,当这些调控元件通过空间相互作用映射到高风险 SNP 时,增强子、转录因子及其靶基因与乳腺癌和前列腺癌的关联往往更高。最后,我们证明携带高风险 SNP 的拓扑关联域(TAD)也含有基因调控元件,其与癌症的关联通常高于那些属于不含有高风险变异的对照 TAD 的基因调控元件。
我们的结果表明,许多 SNP 可能通过与基因调控元件的长距离染色质相互作用影响某些与肿瘤相关基因的表达,从而导致癌症的发生。将大规模的遗传数据集与三维基因组结构相结合,为系统研究遗传变异在疾病风险和进展中的功能机制提供了一种有吸引力和独特的方法。