Research Institute for Genetics and Selection of Industrial Microorganisms, Genetika, 1st Dorozhny proezd, 1, Moscow, 117545, Russia.
BMC Genomics. 2010 Jan 19;11:48. doi: 10.1186/1471-2164-11-48.
Recently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied.
We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene.
CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.
最近发现人类基因组包含许多非编码 RNA 的转录起始位点。这些非编码 RNA 转录的相关调控区研究甚少。其中一些调控区可能与远离任何蛋白编码基因转录起始位点的 CpG 岛有关。人类基因组包含许多这样的 CpG 岛;然而,直到现在它们的性质还没有被系统地研究过。
我们使用生物信息学和比较基因组学的方法研究了人类基因组中不同区域的 CpG 岛。我们观察到 CpG 岛倾向于与外显子重叠,包括远离转录起始位点的外显子,但通常延伸到内含子中。在 CpG 岛与蛋白编码外显子重叠的区域,CpG 所含密码子的同义替换率显著降低,即使它们位于转录起始位点的下游很远。CAGE 标签分析显示,所有 CpG 岛都有频繁的转录起始位点,包括那些远离蛋白编码基因转录起始位点的 CpG 岛。对已发表的 ChIP-chip 数据的计算预测和分析表明,CpG 岛包含更多 Sp1 蛋白识别的位点。通常含有更多 CAGE 标签的 CpG 岛也含有更多的 Sp1 结合位点。这在位于基因 3' 区的 CpG 岛中尤为明显。在远离任何已知蛋白编码基因转录起始位点的 CAGE 富集 CpG 岛中发现了各种转录的例子,这些转录通过 mRNAs 或 ESTs 得到证实,但没有证据表明存在蛋白编码基因。
远离蛋白编码基因转录起始位点的 CpG 岛具有转录起始活性,并显示 Sp1 结合特性。在与这些岛重叠的外显子中,CpG 所含密码子的同义替换率降低。这表明这些 CpG 岛参与转录起始,可能是某些非编码 RNA 的转录起始。