Ohler Uwe, Liao Guo-chun, Niemann Heinrich, Rubin Gerald M
Department of Molecular and Cell Biology, University of California at Berkeley, Berkeley, CA 94720-3200, USA.
Genome Biol. 2002;3(12):RESEARCH0087. doi: 10.1186/gb-2002-3-12-research0087. Epub 2002 Dec 20.
The core promoter, a region of about 100 base-pairs flanking the transcription start site (TSS), serves as the recognition site for the basal transcription apparatus. Drosophila TSSs have generally been mapped by individual experiments; the low number of accurately mapped TSSs has limited analysis of promoter sequence motifs and the training of computational prediction tools.
We identified TSS candidates for about 2,000 Drosophila genes by aligning 5' expressed sequence tags (ESTs) from cap-trapped cDNA libraries to the genome, while applying stringent criteria concerning coverage and 5'-end distribution. Examination of the sequences flanking these TSSs revealed the presence of well-known core promoter motifs such as the TATA box, the initiator and the downstream promoter element (DPE). We also define, and assess the distribution of, several new motifs prevalent in core promoters, including what appears to be a variant DPE motif. Among the prevalent motifs is the DNA-replication-related element DRE, recently shown to be part of the recognition site for the TBP-related factor TRF2. Our TSS set was then used to retrain the computational promoter predictor McPromoter, allowing us to improve the recognition performance to over 50% sensitivity and 40% specificity. We compare these computational results to promoter prediction in vertebrates.
There are relatively few recognizable binding sites for previously known general transcription factors in Drosophila core promoters. However, we identified several new motifs enriched in promoter regions. We were also able to significantly improve the performance of computational TSS prediction in Drosophila.
核心启动子是转录起始位点(TSS)两侧约100个碱基对的区域,作为基础转录装置的识别位点。果蝇的TSS通常通过单个实验进行定位;精确映射的TSS数量较少,限制了对启动子序列基序的分析以及计算预测工具的训练。
我们通过将来自帽捕获cDNA文库的5'表达序列标签(EST)与基因组比对,同时应用关于覆盖范围和5'端分布的严格标准,确定了约2000个果蝇基因的TSS候选序列。对这些TSS侧翼序列的检查揭示了存在诸如TATA盒、起始子和下游启动子元件(DPE)等著名的核心启动子基序。我们还定义并评估了核心启动子中几种普遍存在的新基序的分布,包括一种似乎是变体DPE基序的基序。普遍存在的基序中有与DNA复制相关的元件DRE,最近显示它是TBP相关因子TRF2识别位点的一部分。然后我们使用我们的TSS集对计算启动子预测器McPromoter进行重新训练,使我们能够将识别性能提高到超过50%的灵敏度和40%的特异性。我们将这些计算结果与脊椎动物中的启动子预测进行了比较。
在果蝇核心启动子中,以前已知的一般转录因子的可识别结合位点相对较少。然而,我们确定了几个在启动子区域富集的新基序。我们还能够显著提高果蝇中计算TSS预测的性能。