Smeds Linnéa, Kamali Kaivan, Kejnovská Iva, Kejnovský Eduard, Chiaromonte Francesca, Makova Kateryna D
Department of Biology, Penn State University, University Park, PA 16802, United States.
Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic.
Nucleic Acids Res. 2025 Apr 10;53(7). doi: 10.1093/nar/gkaf298.
Non-canonical (non-B) DNA structures-e.g. bent DNA, hairpins, G-quadruplexes (G4s), Z-DNA, etc.-which form at certain sequence motifs (e.g. A-phased repeats, inverted repeats, etc.), have emerged as important regulators of cellular processes and drivers of genome evolution. Yet, they have been understudied due to their repetitive nature and potentially inaccurate sequences generated with short-read technologies. Here we comprehensively characterize such motifs in the long-read telomere-to-telomere (T2T) genomes of human, bonobo, chimpanzee, gorilla, Bornean orangutan, Sumatran orangutan, and siamang. Non-B DNA motifs are enriched at the genomic regions added to T2T assemblies and occupy 9%-15%, 9%-11%, and 12%-38% of autosomes and chromosomes X and Y, respectively. G4s and Z-DNA are enriched at promoters and enhancers, as well as at origins of replication. Repetitive sequences harbor more non-B DNA motifs than non-repetitive sequences, especially in the short arms of acrocentric chromosomes. Most centromeres and/or their flanking regions are enriched in at least one non-B DNA motif type, consistent with a potential role of non-B structures in determining centromeres. Our results highlight the uneven distribution of predicted non-B DNA structures across ape genomes and suggest their novel functions in previously inaccessible genomic regions.
非经典(非B型)DNA结构,例如弯曲DNA、发夹结构、G-四链体(G4s)、Z-DNA等,在某些序列基序(例如A相重复序列、反向重复序列等)处形成,已成为细胞过程的重要调节因子和基因组进化的驱动因素。然而,由于它们的重复性质以及短读长技术产生的潜在不准确序列,它们一直未得到充分研究。在这里,我们全面表征了人类、倭黑猩猩、黑猩猩、大猩猩、婆罗洲猩猩、苏门答腊猩猩和合趾猴的长读长端粒到端粒(T2T)基因组中的此类基序。非B型DNA基序在添加到T2T组装的基因组区域中富集,分别占常染色体以及X和Y染色体的9%-15%、9%-11%和12%-38%。G4s和Z-DNA在启动子、增强子以及复制起点处富集。重复序列比非重复序列含有更多的非B型DNA基序,尤其是在近端着丝粒染色体的短臂中。大多数着丝粒和/或其侧翼区域至少富集一种非B型DNA基序类型,这与非B型结构在确定着丝粒中的潜在作用一致。我们的结果突出了预测的非B型DNA结构在猿类基因组中的不均匀分布,并表明它们在以前无法进入的基因组区域中的新功能。