Mohanty Saswat K, Chiaromonte Francesca, Makova Kateryna D
Molecular, Cellular, and Integrative Biosciences, Huck Institutes of the Life Sciences, Penn State University, University Park, PA 16802, USA.
Department of Biology, Penn State University, University Park, PA 16802, USA.
bioRxiv. 2024 Nov 6:2024.11.05.621973. doi: 10.1101/2024.11.05.621973.
G-quadruplexes (G4s) are non-canonical DNA structures that can form at approximately 1% of the human genome. G4s contribute to point mutations and structural variation and thus facilitate genomic instability. They play important roles in regulating replication, transcription, and telomere maintenance, and some of them evolve under purifying selection. Nevertheless, the evolutionary dynamics of G4s has remained underexplored. Here we conducted a comprehensive analysis of predicted G4s (pG4s) in the recently released, telomere-to-telomere (T2T) genomes of human and other great apes-bonobo, chimpanzee, gorilla, Bornean orangutan, and Sumatran orangutan. We annotated tens of thousands of new pG4s in T2T compared to previous ape genome assemblies, including 41,236 in the human genome. Analyzing species alignments, we found approximately one-third of pG4s shared by all apes studied and identified thousands of species- and genus-specific pG4s. pG4s accumulated and diverged at rates consistent with divergence times between the studied species. We observed a significant enrichment and hypomethylation of pG4 shared across species at regulatory regions, including promoters, 5' and 3'UTRs, and origins of replication, strongly suggesting their formation and functional role in these regions. pG4s shared among great apes displayed lower methylation levels compared to species-specific pG4s, suggesting evolutionary conservation of functional roles of the former. Many species-specific pG4s were located in the repetitive and satellite regions deciphered in the T2T genomes. Our findings illuminate the evolutionary dynamics of G4s, their role in gene regulation, and their potential contribution to species-specific adaptations in great apes, emphasizing the utility of high-resolution T2T genomes in uncovering previously elusive genomic features.
G-四链体(G4s)是一种非经典的DNA结构,大约在人类基因组的1%处形成。G4s会导致点突变和结构变异,从而促进基因组的不稳定性。它们在调节复制、转录和端粒维持中发挥重要作用,其中一些在纯化选择下进化。然而,G4s的进化动态仍未得到充分探索。在这里,我们对人类和其他大型猿类(倭黑猩猩、黑猩猩、大猩猩、婆罗洲猩猩和苏门答腊猩猩)最近发布的端粒到端粒(T2T)基因组中的预测G4s(pG4s)进行了全面分析。与之前的猿类基因组组装相比,我们在T2T中注释了数万个新的pG4s,其中人类基因组中有41236个。通过分析物种比对,我们发现所有研究的猿类共有约三分之一的pG4s,并鉴定出数千个物种和属特异性的pG4s。pG4s的积累和分歧速率与所研究物种之间的分歧时间一致。我们观察到在包括启动子、5'和3'非翻译区以及复制起点在内的调控区域,跨物种共享的pG4s显著富集且甲基化程度较低,这强烈表明它们在这些区域的形成和功能作用。与物种特异性的pG4s相比,大型猿类之间共享的pG4s甲基化水平较低,这表明前者功能作用的进化保守性。许多物种特异性的pG4s位于T2T基因组中解析出的重复和卫星区域。我们的研究结果揭示了G4s的进化动态、它们在基因调控中的作用以及它们对大型猿类物种特异性适应的潜在贡献,强调了高分辨率T2T基因组在揭示以前难以捉摸的基因组特征方面的实用性。