Brázda Václav, Bohálová Natália, Bowater Richard P
Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, Brno 612 65, Czech Republic.
Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, Brno 612 65, Czech Republic; Department of Experimental Biology, Faculty of Science, Masaryk University, Kamenice 5, Brno 62500, Czech Republic.
Gene. 2022 Feb 5;810:146058. doi: 10.1016/j.gene.2021.146058. Epub 2021 Nov 1.
Taking advantage of evolving and improving sequencing methods, human chromosome 8 is now available as a gapless, end-to-end assembly. Thanks to advances in long-read sequencing technologies, its centromere, telomeres, duplicated gene families and repeat-rich regions are now fully sequenced. We were interested to assess if the new assembly altered our understanding of the potential impact of non-B DNA structures within this completed chromosome sequence. It has been shown that non-B secondary structures, such as G-quadruplexes, hairpins and cruciforms, have important regulatory functions and potential as targeted therapeutics. Therefore, we analysed the presence of putative G-quadruplex forming sequences and inverted repeats in the current human reference genome (GRCh38) and in the new end-to-end assembly of chromosome 8. The comparison revealed that the new assembly contains significantly more inverted repeats and G-quadruplex forming sequences compared to the current reference sequence. This observation can be explained by improved accuracy of the new sequencing methods, particularly in regions that contain extensive repeats of bases, as is preferred by many non-B DNA structures. These results show a significant underestimation of the prevalence of non-B DNA secondary structure in previous assembly versions of the human genome and point to their importance being not fully appreciated. We anticipate that similar observations will occur as the improved sequencing technologies fill in gaps across the genomes of humans and other organisms.
利用不断发展和改进的测序方法,人类8号染色体现在已成为一个无间隙的端到端组装体。得益于长读长测序技术的进步,其着丝粒、端粒、重复基因家族和富含重复序列的区域现在已全部完成测序。我们有兴趣评估这个新的组装体是否改变了我们对这个完整染色体序列中非B型DNA结构潜在影响的理解。研究表明,非B型二级结构,如G-四链体、发夹结构和十字形结构,具有重要的调节功能和作为靶向治疗药物的潜力。因此,我们分析了当前人类参考基因组(GRCh38)和新的8号染色体端到端组装体中假定的G-四链体形成序列和反向重复序列的存在情况。比较结果显示,与当前参考序列相比,新的组装体包含明显更多的反向重复序列和G-四链体形成序列。这一观察结果可以通过新测序方法准确性的提高来解释,特别是在包含大量碱基重复的区域,这是许多非B型DNA结构所偏好的。这些结果表明,在人类基因组以前的组装版本中,非B型DNA二级结构的发生率被严重低估,也表明其重要性尚未得到充分认识。我们预计,随着改进的测序技术填补人类和其他生物基因组中的空白,类似的情况也会出现。