Pedersen A G, Baldi P, Chauvin Y, Brunak S
Department of Biotechnology, Technical University of Denmark, Lyngby, Denmark.
Comput Chem. 1999 Jun 15;23(3-4):191-207. doi: 10.1016/s0097-8485(99)00015-7.
Computational prediction of eukaryotic promoters from the nucleotide sequence is one of the most attractive problems in sequence analysis today, but it is also a very difficult one. Thus, current methods predict in the order of one promoter per kilobase in human DNA, while the average distance between functional promoters has been estimated to be in the range of 30-40 kilobases. Although it is conceivable that some of these predicted promoters correspond to cryptic initiation sites that are used in vivo, it is likely that most are false positives. This suggests that it is important to carefully reconsider the biological data that forms the basis of current algorithms, and we here present a review of data that may be useful in this regard. The review covers the following topics: (1) basal transcription and core promoters, (2) activated transcription and transcription factor binding sites, (3) CpG islands and DNA methylation, (4) chromosomal structure and nucleosome modification, and (5) chromosomal domains and domain boundaries. We discuss the possible lessons that may be learned, especially with respect to the wealth of information about epigenetic regulation of transcription that has been appearing in recent years.
从核苷酸序列对真核生物启动子进行计算预测是当今序列分析中最具吸引力的问题之一,但也是一个非常困难的问题。因此,目前的方法预测人类DNA中每千碱基约有一个启动子,而功能启动子之间的平均距离估计在30 - 40千碱基范围内。虽然可以想象这些预测的启动子中有一些对应于体内使用的隐蔽起始位点,但很可能大多数是假阳性。这表明仔细重新考虑构成当前算法基础的生物学数据很重要,我们在此对这方面可能有用的数据进行综述。综述涵盖以下主题:(1)基础转录和核心启动子,(2)激活转录和转录因子结合位点,(3)CpG岛和DNA甲基化,(4)染色体结构和核小体修饰,以及(5)染色体结构域和结构域边界。我们讨论了可能学到的经验教训,特别是关于近年来出现的大量转录表观遗传调控信息。