人类基因组启动子区域中的保守短序列。

Conserved short sequences in promoter regions of human genome.

机构信息

Department of Biochemistry, University of Hyderabad, Hyderabad - 500 046, India.

出版信息

J Biomol Struct Dyn. 2010 Apr;27(5):599-610. doi: 10.1080/07391102.2010.10508574.

DOI:10.1080/07391102.2010.10508574

PMID:20085377

Abstract

Recognition of promoter elements by the transcription factors is one of the early initial and crucial steps in gene expression and regulation. In prokaryotes, there are clear signals to identify the promoter regions like TATAAT at around -10 and TTGACA at -35 positions from transcription start site (TSS). In eukaryotes the promoter regions are structurally more complex and there are no conserved or consensus sequences similar to the ones found in prokaryotic promoters. We have located a set of GC rich short sequences (< 8 nt) that are relatively common in human promoter sequences around the TSS (+/- 100 relative to TSS). These sequences were sorted based on their frequency of occurrence in the database and the most common 50 sequences were used for further studies. Sigmoidal behavior of the high end of the frequency distribution of these sequences suggests presence of some internal co-operativity. These short sequences are distributed on both sides of TSS, suggesting that probably the transcription factors recognize these sequences on both upstream and downstream of TSS. As eukaryotic promoters lack any conserved sequences, we expect that these short sequences may help in recognition of promoter regions by relevant transcription factors prior to the initiation of transcription process. We postulate that a cluster of genes with common short sequences in the promoter region can be recognized by a particular transcription factor. We also found that most of these short sequences are fairly common within miRNA (both mature and stem-loop sequences). Our studies indicate that eukaryotic transcription is more complex than currently believed.

摘要

转录因子识别启动子元件是基因表达和调控的早期初始和关键步骤之一。在原核生物中，有明确的信号可以识别启动子区域，例如转录起始位点（TSS）周围的-10 处的 TATAAT 和-35 处的 TTGACA。在真核生物中，启动子区域结构更为复杂，没有类似于原核生物启动子中发现的保守或共识序列。我们已经定位了一组富含 GC 的短序列（<8nt），这些序列在 TSS 周围的人类启动子序列中相对常见（相对于 TSS 的正负 100 位）。这些序列根据它们在数据库中的出现频率进行排序，最常见的 50 个序列用于进一步研究。这些序列的频率分布高端呈“S”形，表明存在某种内部协同作用。这些短序列分布在 TSS 的两侧，表明转录因子可能在 TSS 的上下游识别这些序列。由于真核生物启动子没有任何保守序列，我们预计这些短序列可能有助于相关转录因子在转录过程开始之前识别启动子区域。我们假设，启动子区域具有共同短序列的一簇基因可以被特定的转录因子识别。我们还发现，这些短序列中的大多数在 miRNA（成熟和茎环序列）中都相当常见。我们的研究表明，真核转录比目前认为的更为复杂。