Sharma Vineet K, Kumar Naveen, Brahmachari Samir K, Ramachandran Srinivasan
G. N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Delhi, India.
Physiol Genomics. 2007 Sep 19;31(1):96-103. doi: 10.1152/physiolgenomics.00183.2006. Epub 2007 Jun 5.
High and broad transcription of eukaryotic genes is facilitated by cost minimization, clustered localization in the genome, elevated G+C content, and low nucleosome formation potential. In this scenario, illumination of correlation between abundance of (TG/CA)(n>or=12) repeats, which are negative cis modulators of transcription, and transcriptional levels and other commonly occurring dinucleotide repeats, is required. Three independent microarray datasets were used to examine the correlation of (TG/CA)(n>or=12) and other dinucleotide repeats with gene expression. Compared with the expected equi-distribution pattern under neutral model, highly transcribed genes were poor in repeats, and conversely, weakly transcribed genes were rich in repeats. Furthermore, the inverse correlation between repeat abundance and transcriptional levels appears to be a global phenomenon encompassing all genes regardless of their breadth of transcription. This selective pattern of exclusion of (TG/CA)(n>or=12) and (AT)(n>or=12) repeats in highly transcribed genes is an additional factor along with cost minimization and elevated GC, and therefore, multiple factors govern high transcription of genes. We observed that even after controlling for the effects of GC and average intron lengths, the effect of repeats albeit somewhat weaker was persistent and definite. In the ribosomal protein coding genes, sequence analysis of orthologs suggests that negative selection for repeats perhaps occurred early in evolution. These observations suggest that negative selection of (TG/CA)(n>or=12) microsatellites in the evolution of the highly expressed genes was also controlled by gene function in addition to intron length.
真核基因的高转录和广泛转录得益于成本最小化、基因组中的簇状定位、升高的G+C含量以及低核小体形成潜力。在这种情况下,需要阐明转录的负性顺式调节因子(TG/CA)(n≥12)重复序列的丰度与转录水平以及其他常见二核苷酸重复序列之间的相关性。使用三个独立的微阵列数据集来检查(TG/CA)(n≥12)和其他二核苷酸重复序列与基因表达的相关性。与中性模型下预期的均匀分布模式相比,高转录基因的重复序列较少,相反,低转录基因的重复序列较多。此外,重复序列丰度与转录水平之间的负相关似乎是一种全局现象,涵盖所有基因,无论其转录广度如何。高转录基因中(TG/CA)(n≥12)和(AT)(n≥12)重复序列的这种选择性排除模式是除成本最小化和GC升高之外的另一个因素,因此,多种因素决定了基因的高转录。我们观察到,即使在控制了GC和平均内含子长度的影响之后,重复序列的影响虽然稍弱但仍然持续且明确。在核糖体蛋白编码基因中,直系同源物的序列分析表明,对重复序列的负选择可能在进化早期就已发生。这些观察结果表明,在高表达基因的进化过程中,(TG/CA)(n≥12)微卫星的负选择除了受内含子长度影响外,还受基因功能的控制。