Ruvinsky A, Ward W
The Institute for Genetics and Bioinformatics, University of New England, Armidale, 2351, NSW, Australia.
J Mol Evol. 2006 Jul;63(1):136-41. doi: 10.1007/s00239-005-0261-6. Epub 2006 May 25.
The majority of eukaryotic genes consist of exons and introns. Introns can be inserted either between codons (phase 0) or within codons, after the first nucleotide (phase 1) and after the second (phase 2). We report here that the frequency of phase 0 increases and phase 1 declines from the 5' region to the 3' end of genes. This trend is particularly noticeable in genomes of Homo sapiens and Arabidopsis thaliana, in which gains of novel introns in the 3' portion of genes were probably a dominant process. Similar but more moderate gradients exist in Drosophila melanogaster and Caenorhabditis elegans genomes, where the accumulation of novel introns was not a prevailing factor. There are nine types of exons, three symmetric (0,0; 1,1; 2,2) and six asymmetric (0,1; 1,0; 1,2; 2,1; 2,0; 0,2). Assuming random distribution of different types of introns along genes, one can expect the frequencies of asymmetric exons such as 0,1 and 1,0 or 1,2 and 2,1 to be approximately equal, allowing for some variation caused by randomness. The gradient in intron distribution leads to a small but consistent and statistically significant bias: phase 1 introns are more likely at the 5' ends and phase 0 introns are more likely at the 3' ends of asymmetric exons. For the same reason, the frequency of 0,0 exons increases and the frequency of 1,1 exons decreases in the 3' direction, at least in H. sapiens and A. thaliana. The number of introns per gene also affects the distribution and frequency of phase 0 and 1 introns. The gradient provides an insight into the evolution of intron-exon structures of eukaryotic genes.
大多数真核基因由外显子和内含子组成。内含子可以插入到密码子之间(0相),也可以插入到密码子内,在第一个核苷酸之后(1相)和第二个核苷酸之后(2相)。我们在此报告,从基因的5'区域到3'末端,0相的频率增加而1相的频率下降。这种趋势在人类和拟南芥的基因组中尤为明显,在这些基因组中,基因3'部分新内含子的获得可能是一个主导过程。在黑腹果蝇和秀丽隐杆线虫的基因组中存在类似但更为缓和的梯度,在这些基因组中,新内含子的积累不是一个主要因素。外显子有九种类型,三种对称类型(0,0;1,1;2,2)和六种不对称类型(0,1;1,0;1,2;2,1;2,0;0,2)。假设不同类型的内含子沿基因随机分布,那么可以预期不对称外显子如0,1和1,0或1,2和2,1的频率大致相等,允许存在一些由随机性引起的变化。内含子分布的梯度导致一个小但一致且具有统计学意义的偏差:在不对称外显子的5'末端,1相内含子更有可能出现,而在3'末端,0相内含子更有可能出现。出于同样的原因,至少在人类和拟南芥中,0,0外显子的频率在3'方向上增加,而1,1外显子的频率下降。每个基因的内含子数量也会影响0相和1相内含子的分布和频率。这种梯度为真核基因内含子 - 外显子结构的进化提供了见解。