The Department of Biochemistry and Molecular Medicine, The George Washington University Medical Center, Washington, DC, USA.
McCormick Genomic and Proteomic Center, The George Washington University, Washington, DC, USA.
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa203.
Genomics has benefited from an explosion in affordable high-throughput technology for whole-genome sequencing. The regulatory and functional aspects in non-coding regions may be an important contributor to oncogenesis. Whole-genome tumor-normal paired alignments were used to examine the non-coding regions in five cancer types and two races. Both a sliding window and a binning strategy were introduced to uncover areas of higher than expected variation for additional study. We show that the majority of cancer associated mutations in 154 whole-genome sequences covering breast invasive carcinoma, colon adenocarcinoma, kidney renal papillary cell carcinoma, lung adenocarcinoma and uterine corpus endometrial carcinoma cancers and two races are found outside of the coding region (4 432 885 in non-gene regions versus 1 412 731 in gene regions). A pan-cancer analysis found significantly mutated windows (292 to 3881 in count) demonstrating that there are significant numbers of large mutated regions in the non-coding genome. The 59 significantly mutated windows were found in all studied races and cancers. These offer 16 regions ripe for additional study within 12 different chromosomes-2, 4, 5, 7, 10, 11, 16, 18, 20, 21 and X. Many of these regions were found in centromeric locations. The X chromosome had the largest set of universal windows that cluster almost exclusively in Xq11.1-an area linked to chromosomal instability and oncogenesis. Large consecutive clusters (super windows) were found (19 to 114 in count) providing further evidence that large mutated regions in the genome are influencing cancer development. We show remarkable similarity in highly mutated non-coding regions across both cancer and race.
基因组学受益于高通量、价格合理的全基因组测序技术的爆炸式发展。非编码区域的调控和功能方面可能是致癌的重要因素。我们使用全基因组肿瘤-正常配对比对来研究五种癌症类型和两种种族的非编码区域。我们引入了滑动窗口和分箱策略来揭示更高预期变异的区域,以便进一步研究。我们表明,在涵盖乳腺浸润性癌、结肠腺癌、肾乳头状细胞癌、肺腺癌和子宫体子宫内膜癌以及两种种族的 154 个全基因组序列中,大多数与癌症相关的突变(4432885 个位于非基因区,1412731 个位于基因区)位于编码区之外。泛癌症分析发现了显著突变的窗口(计数范围为 292 至 3881),表明非编码基因组中存在大量大突变区域。在所有研究的种族和癌症中都发现了 59 个显著突变的窗口。这些为 12 个不同染色体(2、4、5、7、10、11、16、18、20、21 和 X)中的 16 个区域提供了进一步研究的机会。这些区域中的许多位于着丝粒位置。X 染色体具有最大的通用窗口集,几乎完全聚集在 Xq11.1-与染色体不稳定性和致癌作用相关的区域。发现了大的连续簇(超级窗口)(计数范围为 19 至 114),进一步证明基因组中较大的突变区域正在影响癌症的发展。我们表明,在癌症和种族中,高度突变的非编码区域具有显著的相似性。