Biosystems Group, Institute of Automatic Control, Silesian University of Technology, 44-100 Gliwice, Poland.
Gene. 2012 Jan 25;492(2):375-81. doi: 10.1016/j.gene.2011.10.050. Epub 2011 Nov 11.
The genomes of warm-blooded vertebrates are a mosaic of alternating fragments, isochores, with low and high GC contents and embedded genes. The evolutionary mechanisms leading to such structures are not fully understood. We have compared the distributions of GC base pairs in coding sequences and sequences spanning 5 kb upstream and downstream of genes in human and other species annotated in the RefSeq database and in different isochores of the human genome. Using our computer application NucleoSeq (available at www.bioinformatics.aei.polsl.pl), we also compared the average distributions of AT-rich regulatory motifs and transcription factor binding sites (TFBS) for single transcription factors with those in randomized sequences of the human genome, and revealed that some TFBS have a lower average frequency in a gene's promoter than in the randomized sequence, whereas for other transcription factors the opposite is observed. TFBS for some transcription factors show a higher frequency in the coding sequence than in the regulatory and in randomized sequences, suggesting their accumulation during evolution and possible functional roles. On the basis of the GC content in genes and their adjacent sequences which was similar in all species studied here, and the distribution of regulatory motifs, we hypothesize that the first step in evolution of many genes existing today was the joining of a GC-rich coding sequence to a region with a lower GC content and the potential to create regulatory motifs.
热血脊椎动物的基因组是由低 GC 含量和高 GC 含量片段交替组成的镶嵌体,称为同调区,并且其中还嵌入了基因。导致这种结构的进化机制尚未完全清楚。我们比较了在人类和其他在 RefSeq 数据库中注释的物种的编码序列以及基因上下游 5kb 的序列中 GC 碱基对的分布,以及人类基因组中不同的同调区。使用我们的计算机应用程序 NucleoSeq(可在 www.bioinformatics.aei.polsl.pl 上获得),我们还比较了单个转录因子的富含 AT 的调控基序和转录因子结合位点(TFBS)的平均分布与人类基因组随机序列的分布,并发现一些 TFBS 在基因启动子中的平均频率低于随机序列,而对于其他转录因子则相反。一些转录因子的 TFBS 在编码序列中的频率高于调控序列和随机序列,这表明它们在进化过程中积累并可能具有功能作用。基于我们在这里研究的所有物种中基因及其相邻序列的 GC 含量相似,以及调控基序的分布,我们假设当今许多基因的进化的第一步是将富含 GC 的编码序列与 GC 含量较低且具有形成调控基序潜力的区域连接起来。