Holberton D V, Marshall J
Department of Life Science, Nottingham University, UK.
Nucleic Acids Res. 1995 Aug 11;23(15):2945-53. doi: 10.1093/nar/23.15.2945.
Protein-coding genes in the ancient eukaryote Giardia lamblia lack typical promoter consensus elements. We have analysed the immediate 5' flanking sequences of seven genes of related function (structural cytoskeleton proteins) to identify shared DNA motifs that might have a role in transcription initiation. Transcription start sites for five genes have been determined previously. Genomic mapping and mRNA primer extension experiments demonstrate additionally that the genes for beta-giardin and median body protein are (i) present as single copies in the genome, (ii) transcribed with very short 5' leader sequences. Two search algorithms designed to extract conserved motifs from either aligned or non-aligned sequences independently discovered three sites constituting a common pattern in all seven promoters. Sites were optimally aligned using weight matrix building trials to achieve the maximum 'information content'. Profiling the information content of best alignments defines the extent of the homologies as: a 9 bp box (initiator) at the start site and upstream 18 and 6 bp boxes. The initiator is the most highly conserved element and contains a universal Py-A-Pu motif at which transcription starts. We show that the best matrices can be combined in a search pattern that correctly locates transcription start sites in genomic DNA sequences.
古老的真核生物蓝氏贾第鞭毛虫中的蛋白质编码基因缺乏典型的启动子共有元件。我们分析了七个相关功能基因(结构细胞骨架蛋白)紧邻的5'侧翼序列,以确定可能在转录起始中起作用的共享DNA基序。先前已经确定了五个基因的转录起始位点。基因组定位和mRNA引物延伸实验进一步证明,β-贾第素和中间小体蛋白的基因(i)在基因组中以单拷贝形式存在,(ii)转录时具有非常短的5'前导序列。两种旨在从比对或未比对序列中提取保守基序的搜索算法独立发现了三个位点,这些位点在所有七个启动子中构成了一种共同模式。通过权重矩阵构建试验对位点进行最佳比对,以实现最大的“信息含量”。对最佳比对的信息含量进行分析,将同源性程度定义为:起始位点处的一个9 bp框(起始子)以及上游的18 bp框和6 bp框。起始子是最保守的元件,包含一个转录起始的通用Py-A-Pu基序。我们表明,最佳矩阵可以组合成一种搜索模式,该模式能够在基因组DNA序列中正确定位转录起始位点。