Pedersen A G, Baldi P, Chauvin Y, Brunak S
Center for Biological Sequence Analysis, The Technical University of Denmark, Building 208, Lyngby, DK-2800, Denmark.
J Mol Biol. 1998 Aug 28;281(4):663-73. doi: 10.1006/jmbi.1998.1972.
The fact that DNA three-dimensional structure is important for transcriptional regulation begs the question of whether eukaryotic promoters contain general structural features independently of what genes they control. We present an analysis of a large set of human RNA polymerase II promoters with a very low level of sequence similarity. The sequences, which include both TATA-containing and TATA-less promoters, are aligned by hidden Markov models. Using three different models of sequence-derived DNA bendability, the aligned promoters display a common structural profile with bendability being low in a region upstream of the transcriptional start point and significantly higher downstream. Investigation of the sequence composition in the two regions shows that the bendability profile originates from the sequential structure of the DNA, rather than the general nucleotide composition. Several trinucleotides known to have high propensity for major groove compression are found much more frequently in the regions downstream of the transcriptional start point, while the upstream regions contain more low-bendability triplets. Within the region downstream of the start point, we observe a periodic pattern in sequence and bendability, which is in phase with the DNA helical pitch. The periodic bendability profile shows bending peaks roughly at every 10 bp with stronger bending at 20 bp intervals. These observations suggest that DNA in the region downstream of the transcriptional start point is able to wrap around protein in a manner reminiscent of DNA in a nucleosome. This notion is further supported by the finding that the periodic bendability is caused mainly by the complementary triplet pairs CAG/CTG and GGC/GCC, which previously have been found to correlate with nucleosome positioning. We present models where the high-bendability regions position nucleosomes at the downstream end of the transcriptional start point, and consider the possibility of interaction between histone-like TAFs and this area. We also propose the use of this structural signature in computational promoter-finding algorithms.
DNA三维结构对转录调控至关重要,这一事实引发了一个问题:真核生物启动子是否具有独立于其所控制基因的一般结构特征。我们对一组序列相似性极低的大量人类RNA聚合酶II启动子进行了分析。这些序列包括含TATA和不含TATA的启动子,通过隐马尔可夫模型进行比对。使用三种不同的序列衍生DNA弯曲度模型,比对后的启动子呈现出共同的结构特征,转录起始点上游区域的弯曲度较低,而下游区域则显著更高。对这两个区域的序列组成进行研究表明,弯曲度特征源自DNA的序列结构,而非一般的核苷酸组成。在转录起始点下游区域发现了几种已知对大沟压缩具有高倾向的三核苷酸,其出现频率更高,而上游区域则包含更多低弯曲度的三联体。在起始点下游区域内,我们观察到序列和弯曲度的周期性模式,这与DNA螺旋间距同步。周期性弯曲度特征显示,大约每10个碱基对出现一个弯曲峰,每隔20个碱基对弯曲更强。这些观察结果表明,转录起始点下游区域的DNA能够以类似于核小体中DNA围绕蛋白质的方式进行缠绕。这一观点进一步得到以下发现的支持:周期性弯曲度主要由互补三联体对CAG/CTG和GGC/GCC引起,此前已发现它们与核小体定位相关。我们提出了模型,其中高弯曲度区域在转录起始点的下游末端定位核小体,并考虑了类组蛋白TAF与该区域之间相互作用的可能性。我们还建议在计算启动子寻找算法中使用这种结构特征。