UofL Health Brown Cancer Center, University of Louisville, 505 S. Hancock St., Louisville, Kentucky 40202, United States.
Department of Medicine, University of Louisville, 505 S. Hancock St., Louisville, Kentucky 40202, United States.
Acc Chem Res. 2022 Nov 15;55(22):3242-3252. doi: 10.1021/acs.accounts.2c00519. Epub 2022 Oct 25.
G-quadruplexes (G4s) are distinctive four-stranded DNA or RNA structures found within cells that are thought to play functional roles in gene regulation and transcription, translation, recombination, and DNA damage/repair. While G4 structures can be uni-, bi-, or tetramolecular with respect to strands, folded unimolecular conformations are most significant . Unimolecular G4 can potentially form in sequences with runs of guanines interspersed with what will become loops in the folded structure: 5'GLGLGLG, where is typically 2-4 and is highly variable. Such sequences are highly conserved and specifically located in genomes. In the folded structure, guanines from each run combine to form planar tetrads with four hydrogen-bonded guanine bases; these tetrads stack on one another to produce four strand segments aligned in specific parallel or antiparallel orientations, connected by the loop sequences. Three types of loops (lateral, diagonal, or "propeller") have been identified. The stacked tetrads form a central cavity that features strong coordination sites for monovalent cations that stabilize the G4 structure, with potassium or sodium preferred. A single monomeric G4 typically forms from a sequence containing roughly 20-30 nucleotides. Such short sequences have been the primary focus of X-ray crystallographic or NMR studies that have produced high-resolution structures of a variety of monomeric G4 conformations. These structures are often used as the basis for drug design efforts to modulate G4 function.We believe that the focus on monomeric G4 structures formed by such short sequences is perhaps myopic. Such short sequences for structural studies are often arbitrarily selected and removed from their native genomic sequence context, and then are often changed from their native sequences by base substitutions or deletions intended to optimize the formation of a homogeneous G4 conformation. We believe instead that G-quadruplexes prefer company and that in a longer natural sequence context multiple adjacent G4 units can form to combine into more complex multimeric G4 structures with richer topographies than simple monomeric forms. Bioinformatic searches of the human genome show that longer sequences with the potential for forming multiple G4 units are common. Telomeric DNA, for example, has a single-stranded overhang of hundreds of nucleotides with the requisite repetitive sequence with the potential for formation of multiple G4s. Numerous extended promoter sequences have similar potentials for multimeric G4 formation. X-ray crystallography and NMR methods are challenged by these longer sequences (>30 nt), so other tools are needed to explore the possible multimeric G4 landscape. We have implemented an integrated structural biology approach to address this challenge. This approach integrates experimental biophysical results with atomic-level molecular modeling and molecular dynamics simulations that provide quantitatively testable model structures. In every long sequence we have studied so far, we found that multimeric G4 structures readily form, with a surprising diversity of structures dependent on the exact native sequence used. In some cases, stable hairpin duplexes form along with G4 units to provide an even richer landscape. This Account provides an overview of our approach and recent progress and provides a new perspective on the G-quadruplex folding landscape.
四链体(G4s)是细胞内特有的一种四链 DNA 或 RNA 结构,被认为在基因调控和转录、翻译、重组以及 DNA 损伤/修复中发挥功能作用。虽然 G4 结构在链方面可以是单链、双链或四链,但折叠的单链构象最为重要。单链 G4 可能在具有散布的鸟嘌呤的序列中形成,这些鸟嘌呤在折叠结构中会变成环:5'GLGLGLG,其中 通常为 2-4, 高度可变。这种序列高度保守,并且在基因组中特异性定位。在折叠结构中,每个运行的鸟嘌呤结合形成具有四个氢键结合的鸟嘌呤碱基的平面四联体;这些四联体彼此堆叠,产生四个沿特定平行或反平行方向排列的链段,由环序列连接。已经确定了三种类型的环(侧、对角或“推进器”)。堆叠的四联体形成一个中央腔,该腔具有强的单电荷阳离子协调位点,可稳定 G4 结构,钾或钠优先。单个单体 G4 通常由包含大约 20-30 个核苷酸的序列形成。这种短序列一直是 X 射线晶体学或 NMR 研究的主要重点,这些研究产生了各种单体 G4 构象的高分辨率结构。这些结构通常用作药物设计的基础,以调节 G4 功能。我们认为,专注于由如此短序列形成的单体 G4 结构可能是短视的。这种短序列的结构研究通常是任意选择的,并从其天然基因组序列背景中去除,然后通常通过碱基取代或缺失来改变其天然序列,目的是优化形成均匀的 G4 构象。相反,我们认为 G-四联体喜欢结伴,在更长的天然序列背景中,多个相邻的 G4 单元可以组合形成比简单单体形式更复杂的多聚体 G4 结构,具有更丰富的拓扑结构。对人类基因组的生物信息学搜索表明,具有形成多个 G4 单元潜力的更长序列很常见。例如,端粒 DNA 具有数百个核苷酸的单链突出,具有形成多个 G4 的必需重复序列。许多扩展的启动子序列具有形成多聚体 G4 的类似潜力。X 射线晶体学和 NMR 方法受到这些较长序列(>30nt)的挑战,因此需要其他工具来探索可能的多聚体 G4 景观。我们已经实施了一种综合结构生物学方法来应对这一挑战。该方法将实验生物物理结果与原子级分子建模和分子动力学模拟相结合,为定量可测试的模型结构提供信息。在我们迄今为止研究的每一个长序列中,我们发现多聚体 G4 结构很容易形成,具体结构取决于使用的确切天然序列,具有惊人的多样性。在某些情况下,与 G4 单元一起形成稳定的发夹双链,提供更丰富的景观。本账户提供了我们方法的概述和最新进展,并提供了 G-四联体折叠景观的新视角。