Artem'ev I V, Vasil'ev G V, Gurevich A I
Bioorg Khim. 1983 Nov;9(11):1544-57.
Nucleotide sequences of 188 promoter-containing DNA regions have been studied by the computer statistic analysis. Undecanucleotide NTT(G/C)TTGACA(A/T) or (G/C) X TT(G/C)A(G/C)A(A/T)TT(G/T) (recognition site) and heptanucleotide RTATATR or TATAATR (initiation site) separated by 12-19 base pairs are characteristic of a "generalized" promoter structure. Promoters can function if a minimal level of correspondence for their recognition and initiation sites to a generalized structure is attained (the correspondence function value for the whole structure is not lower than 0,61; for the most effective promoters it may be equal to 1). The transcription start is situated 3-9 base pairs after initiation site, 4-7 pairs distance being the most effective. Transcription can start from any nucleotide, preferably with A or G. The start from A is the most effective if it is contained within the CAC or CAT trinucleotides. The promoter efficiency is enhanced by some additional structural factors: the presence of an extended A-T rich region directly before the recognition site; availability of integral promoter structures or several RNA polymerase binding sites in the preceding nucleotide sequence. A characteristic feature of the promoter is the presence of either the dyadic axial symmetry elements in the initiation and recognition sites as well as in the intermediate region, or the A-T rich area in the latter.