Institute of Cell Biophysics of RAS, 3 Institutskaya str., Poushchino, 142290, Russia.
J Bioinform Comput Biol. 2020 Apr;18(2):2040001. doi: 10.1142/S0219720020400016.
RNA polymerase/promoter recognition represents a basic problem of molecular biology. Decades-long efforts were made in the area, and yet certain challenges persist. The usage of certain most suitable model subjects is pivotal for the research. System of T7 bacteriophage RNA-polymerase/T7 native promoter represents an exceptional example for the purpose. Moreover, it has been studied the most and successfully applied to aims of biotechnology and bioengineering. Both structural simplicity and high specificity of this molecular duo are the reason for this. Despite highly similar sequences of distinct T7 native promoters, the T7 RNA-polymerase enzyme is capable of binding respective promoter in a highly specific and adjustable manner. One explanation here is that the process relies primarily on DNA physical properties rather than nucleotide sequence. Here, we address the issue by analyzing massive data recently published by Komura and colleagues. This initial study employed Next Generation Sequencing (NGS) in order to quantify activity of promoter variants including ones with multiple substitutions. As a result of our work substantial bias in simultaneous occurrence of single-nucleotide sequence alterations was found: the highest rate of co-occurrence was evidenced within specificity loop of binding region while the lowest - in initiation region of promoter. If both location and a kind of nucleotides involved in replacement (both initial and resulting) are taken into consideration, one can easily note that N to A substitutions are most preferred ones across the whole 19 b.p.-long sequence. At the same time, N to C are tolerated only at crucial position in recognition loop of binding region, and N to G are uniformly least tolerable. Later in this work the complete set of variants was split into groups with mutations (1) exclusively in binding region; (2) exclusively in melting region; (3) in both regions. Among these three groups second comprises extremely few variants (at triple-digit rate lesser than in two other groups, 46 versus over one and six thousand). Yet these are all promoter with substantial to high activity. This group two appeared heterogenous by primary sequence; indeed, upon further subdivision into above versus below average activity subgroups first one was found to comprise promoters with negligible conservation at 2 position of melting region; the second was hardly conserved in this region at all. This draws our attention to perfect consensus sequence of class III T7 promoter with 2 nucleotide randomized (all four are present by one to several copies in the previously published source dataset), the picture becomes even more pronounced. We therefore suggest that mutations at the position therefore do not cause significant changes in terms of promoter activity. At the same time, such modifications dramatically change DNA physical properties which were calculated in our study (namely electrostatic potential and propensity to bend). One possible suggestion here is that 2 nucleotide might function as a generic switch; if so, substitution 2A to 2T has important regulatory consequences. The fact that that 2 b.p. is the most evidently different nucleotide between class II versus class III promoters of T7 genome and that it also distinguishes the class III promoter in T7 genome versus promoters of its relative but reproductively isolated bacteriophage T3. In other words, it appears feasible that mutation at 2 nucleotide does not impede promoter activity yet alter its physical properties thus affecting differential RNA polymerase/promoter interaction.
RNA 聚合酶/启动子识别是分子生物学的一个基本问题。几十年来,人们一直在该领域进行研究,但仍存在一些挑战。使用某些最合适的模型对象对于研究至关重要。T7 噬菌体 RNA 聚合酶/T7 天然启动子系统就是一个很好的例子。此外,它已被广泛研究,并成功应用于生物技术和生物工程的目标。这种分子对结构简单且特异性高是其原因。尽管不同的 T7 天然启动子具有高度相似的序列,但 T7 RNA 聚合酶能够以高度特异性和可调节的方式结合各自的启动子。一种解释是,该过程主要依赖于 DNA 的物理性质而不是核苷酸序列。在这里,我们通过分析 Komura 及其同事最近发表的大量数据来解决这个问题。这项初步研究使用下一代测序 (NGS) 来定量分析包括具有多个取代的启动子变体的活性。作为我们工作的结果,我们发现了单核苷酸序列改变同时发生的显著偏差:在结合区域的特异性环中,协同发生的速率最高,而在启动子的起始区域中最低。如果同时考虑核苷酸的位置和替代类型(包括初始和最终的),那么人们可以很容易地注意到,整个 19 个碱基对长的序列中,N 到 A 的取代是最优选的。同时,在结合区域的识别环中,N 到 C 只允许在关键位置发生替换,而 N 到 G 则完全不能容忍。在这项工作的后期,将完整的变体集分为三组:(1)仅在结合区域发生突变;(2)仅在熔解区域发生突变;(3)在两个区域都发生突变。在这三组中,第二组变体非常少(比其他两组少三位数,46 比一千多和六千多)。然而,这些都是具有高到高活性的启动子。第二组在一级序列上表现出异质性;事实上,在进一步细分为高于或低于平均活性亚组后,发现第一个亚组在熔解区域的 2 位上几乎没有保守性;第二个亚组在这个区域完全没有保守性。这引起了我们对 T7 类 III 启动子的完美共识序列的关注,其中 2 个核苷酸随机化(在之前发表的源数据集的所有四个位置都存在一到几个副本),情况变得更加明显。因此,我们建议,该位置的突变不会导致启动子活性的显著变化。同时,这些修饰会极大地改变我们在研究中计算的 DNA 物理性质(即静电势和弯曲倾向)。一种可能的建议是,2 个核苷酸可能作为通用开关;如果是这样,2A 到 2T 的取代会产生重要的调节后果。事实上,2 个碱基对是 T7 基因组中 II 类和 III 类启动子之间最明显不同的核苷酸,也是 T7 基因组中 III 类启动子与相对但生殖隔离的噬菌体 T3 的启动子之间的区别。换句话说,2 个核苷酸的突变似乎不会阻碍启动子的活性,但会改变其物理性质,从而影响 RNA 聚合酶/启动子的差异相互作用。