d'Aubenton Carafa Y, Brody E, Thermes C
C.N.R.S. Centre de Génétique Moléculaire, Laboratoire-associé à l'Université Pierre et Marie Curie, Gif-sur-Yvette, France.
J Mol Biol. 1990 Dec 20;216(4):835-58. doi: 10.1016/s0022-2836(99)80005-9.
Escherichia coli rho-independent transcription terminators are characterized by an RNA structure having a G+C-rich stem-loop followed by a series of uridine residues, but they can be only partially predicted by the stability of this structure or by its primary sequence. A large number of such terminators have been identified or proposed in the literature, and we have constituted a list of them (148 found in 1021 x 10(3) base-pairs of E. coli DNA sequences) in order to analyze statistically the corresponding RNA hairpins. We show that the size of the loops presents a narrow distribution, that their sequences are not random, and that most loops are closed by a C.G base-pair. In particular, 55% of the loops are tetranucleotides and the most abundant loop sequences are UUCG and GAAA. These loops are abundant in prokaryotic and eukaryotic RNAs, and are known to enhance the stability of RNA hairpins. We propose that these tetraloops play an important role in the nucleation of the nascent RNA structures, as does also the presence of a C.G base-pair closing a hairpin loop. This analysis allows us to propose a model of formation of an RNA hairpin during the termination process and to construct an algorithm of prediction of the terminators in a given DNA sequence. For the E. coli sequences, it clearly distinguishes inter- from intracistronic terminator-like structures, and selects 141 of the 148 rho-independent terminators given in the literature, with a very low background. It also predicts with reasonable accuracy the in vitro termination efficiency of known rho-independent terminators, as well as predicting the existence of 35 as yet uncharacterized terminators.
大肠杆菌ρ因子非依赖性转录终止子的特征是具有富含G+C的茎环结构,其后跟着一系列尿苷残基,但仅通过该结构的稳定性或其一级序列只能部分预测它们。文献中已鉴定或提出了大量此类终止子,我们已列出了一个清单(在1021×10³个碱基对的大肠杆菌DNA序列中发现148个),以便对相应的RNA发夹进行统计分析。我们表明,环的大小呈现出狭窄的分布,其序列不是随机的,并且大多数环由一个C.G碱基对封闭。特别是,55%的环是四核苷酸,最丰富的环序列是UUCG和GAAA。这些环在原核和真核RNA中都很丰富,并且已知可增强RNA发夹的稳定性。我们提出,这些四环在新生RNA结构的成核过程中起重要作用,封闭发夹环的C.G碱基对的存在也是如此。这种分析使我们能够提出一个在终止过程中RNA发夹形成的模型,并构建一种预测给定DNA序列中终止子的算法。对于大肠杆菌序列,它能清楚地区分顺反子间和顺反子内类似终止子的结构,并从文献中给出的148个ρ因子非依赖性终止子中选择141个,背景非常低。它还能以合理的准确性预测已知的ρ因子非依赖性终止子的体外终止效率,以及预测35个尚未鉴定的终止子的存在。