Kim Minji, Kreig Alex, Lee Chun-Ying, Rube H Tomas, Calvert Jacob, Song Jun S, Myong Sua
Department of Electrical and Computer Engineering, University of Illinois; 306 N. Wright St. Urbana, IL 61801, USA Institute for Genomic Biology; 1206 Gregory Drive, Urbana, IL 61801, USA.
Department of Bioengineering, University of Illinois; 1304 W. Springfield Ave. Urbana, IL 61801, USA.
Nucleic Acids Res. 2016 Jun 2;44(10):4807-17. doi: 10.1093/nar/gkw272. Epub 2016 Apr 19.
G-quadruplex (GQ) is a four-stranded DNA structure that can be formed in guanine-rich sequences. GQ structures have been proposed to regulate diverse biological processes including transcription, replication, translation and telomere maintenance. Recent studies have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. We present a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling. We find that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity. Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. Our study provides important new parameters that can inform the evaluation and classification of putative GQ sequences in the human genome.
G-四链体(GQ)是一种可以在富含鸟嘌呤的序列中形成的四链DNA结构。GQ结构被认为可以调节多种生物过程,包括转录、复制、翻译和端粒维持。最近的研究表明,在活的哺乳动物细胞中存在GQ DNA,并且在人类基因组中有大量潜在的GQ形成序列。我们通过整合荧光测量、单分子成像和计算建模,对双链DNA中438个GQ形成序列的大集合进行了系统和定量的GQ折叠倾向分析。我们发现,最短环长度和胸腺嘧啶碱基是导致高GQ折叠倾向的两个主要因素。线性和高斯过程回归模型进一步验证,基于环长度分布和环序列的核苷酸含量,可以高精度地预测GQ折叠潜力。我们的研究提供了重要的新参数,可用于评估和分类人类基因组中假定的GQ序列。