Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
BMC Evol Biol. 2013 Feb 28;13:57. doi: 10.1186/1471-2148-13-57.
Most eukaryotic genes are interrupted by spliceosomal introns. The evolution of exon-intron structure remains mysterious despite rapid advance in genome sequencing technique. In this work, a novel approach is taken based on the assumptions that the evolution of exon-intron structure is a stochastic process, and that the characteristics of this process can be understood by examining its historical outcome, the present-day size distribution of internal translated exons (exon). Through the combination of simulation and modeling the size distribution of exons in different species, we propose a general random fragmentation process (GRFP) to characterize the evolution dynamics of exon-intron structure. This model accurately predicts the probability that an exon will be split by a new intron and the distribution of novel insertions along the length of the exon.
As the first observation from this model, we show that the chance for an exon to obtain an intron is proportional to its size to the 3rd power. We also show that such size dependence is nearly constant across gene, with the exception of the exons adjacent to the 5' UTR. As the second conclusion from the model, we show that intron insertion loci follow a normal distribution with a mean of 0.5 (center of the exon) and a standard deviation of 0.11. Finally, we show that intron insertions within a gene are independent of each other for vertebrates, but are more negatively correlated for non-vertebrate. We use simulation to demonstrate that the negative correlation might result from significant intron loss during evolution, which could be explained by selection against multi-intron genes in these organisms.
The GRFP model suggests that intron gain is dynamic with a higher chance for longer exons; introns are inserted into exons randomly with the highest probability at the center of the exon. GRFP estimates that there are 78 introns in every 10 kb coding sequences for vertebrate genomes, agreeing with empirical observations. GRFP also estimates that there are significant intron losses in the evolution of non-vertebrate genomes, with extreme cases of around 57% intron loss in Drosophila melanogaster, 28% in Caenorhabditis elegans, and 24% in Oryza sativa.
大多数真核生物基因都被剪接体内含子打断。尽管基因组测序技术取得了快速进展,但外显子-内含子结构的演化仍然是一个谜。在这项工作中,我们基于以下假设采用了一种新方法:外显子-内含子结构的演化是一个随机过程,通过检查这一过程的历史结果,即目前内部翻译外显子(exon)的大小分布,可以理解这一过程的特征。通过模拟和建模不同物种中外显子大小分布的结合,我们提出了一种通用的随机片段化过程(GRFP)来描述外显子-内含子结构的演化动态。该模型能够准确预测新内含子将外显子分割的概率以及新插入物沿外显子长度的分布。
作为该模型的第一个观察结果,我们表明外显子获得内含子的机会与外显子大小的 3 次方成正比。我们还表明,这种大小依赖性在基因之间几乎是恒定的,除了与 5'UTR 相邻的外显子。作为模型的第二个结论,我们表明内含子插入位点遵循均值为 0.5(外显子中心)且标准差为 0.11 的正态分布。最后,我们表明脊椎动物的基因内内含子插入是相互独立的,但在非脊椎动物中则呈负相关。我们通过模拟表明,这种负相关性可能是由于进化过程中外显子大量丢失造成的,这可以用这些生物中多内含子基因的选择来解释。
GRFP 模型表明,内含子的获得是动态的,较长的外显子获得内含子的机会更高;内含子随机插入外显子,在外显子中心的概率最高。GRFP 估计脊椎动物基因组每 10kb 编码序列中有 78 个内含子,与经验观察结果一致。GRFP 还估计非脊椎动物基因组的进化过程中存在大量内含子丢失,在果蝇中丢失了大约 57%的内含子,秀丽隐杆线虫中丢失了 28%,水稻中丢失了 24%。