Kaplan Oktay I, Berber Burak, Hekim Nezih, Doluca Osman
Berlin Institute for Medical Systems Biology, Max Delbrück Center, 13125 Berlin, Germany.
School of Medicine, Istanbul Medeniyet University, 34000 Istanbul, Turkey.
Nucleic Acids Res. 2016 Nov 2;44(19):9083-9095. doi: 10.1093/nar/gkw769. Epub 2016 Sep 4.
Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns.
许多研究表明,短非编码序列在调控元件中广泛保守。自下一代测序技术发展以来,越来越多的保守序列被发现。一种识别具有调控作用的保守序列的常用方法依赖于拓扑变化,如DNA或RNA水平上的发夹形成。G-四链体是一种尚未确定生物学作用的非经典核酸拓扑结构,越来越多地被用于保守调控元件的发现。由于G-四链体的三级结构强烈依赖于环序列,而一般公认的算法忽略了该序列,我们推测可以基于环序列的差异,通过系统发育聚类来确定具有相似拓扑结构以及间接相似相互作用模式的G-四链体。对大肠杆菌基因组中52个形成G-四链体的序列进行系统发育分析,发现了两个具有潜在调控作用的保守G-四链体基序。进一步分析表明,这两个基序都倾向于形成发夹和G-四链体,圆二色性研究也支持这一点。本文所述的系统发育分析可以极大地促进功能性G-四链体结构的发现,并可能解释未知的调控模式。