Carmel Liran, Rogozin Igor B, Wolf Yuri I, Koonin Eugene V
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Genome Res. 2007 Jul;17(7):1045-50. doi: 10.1101/gr.5978207. Epub 2007 May 10.
Introns that interrupt eukaryotic protein-coding sequences are generally thought to be nonfunctional. However, for reasons still poorly understood, positions of many introns are highly conserved in evolution. Previous reconstructions of intron gain and loss events during eukaryotic evolution used a variety of simplified evolutionary models that yielded contradicting conclusions and are not suited to reveal some of the key underlying processes. We combine a comprehensive probabilistic model and an extended data set, including 391 conserved genes from 19 eukaryotes, to uncover previously unnoticed aspects of intron evolution--in particular, to assign intron gain and loss rates to individual genes. The rates of intron gain and loss in a gene show moderate positive correlation. A gene's intron gain rate shows a highly significant negative correlation with the coding-sequence evolution rate; intron loss rate also significantly, but positively, correlates with the sequence evolution rate. Correlations of the opposite signs, albeit less significant ones, are observed between intron gain and loss rates and gene expression level. It is proposed that intron evolution includes a neutral component, which is manifest in the positive correlation between the gain and loss rates and a selection-driven component as reflected in the links between intron gain and loss and sequence evolution. The increased intron gain and decreased intron loss in evolutionarily conserved genes indicate that intron insertion often might be adaptive, whereas some of the intron losses might be deleterious. This apparent functional importance of introns is likely to be due, at least in part, to their multiple effects on gene expression.
一般认为,打断真核生物蛋白质编码序列的内含子是没有功能的。然而,由于一些仍不清楚的原因,许多内含子的位置在进化过程中高度保守。以前在真核生物进化过程中对内含子获得和丢失事件的重建使用了各种简化的进化模型,这些模型得出了相互矛盾的结论,并且不适合揭示一些关键的潜在过程。我们结合了一个全面的概率模型和一个扩展数据集,包括来自19种真核生物的391个保守基因,以揭示内含子进化中以前未被注意到的方面——特别是,为单个基因确定内含子获得和丢失的速率。一个基因中内含子获得和丢失的速率显示出适度的正相关。一个基因的内含子获得速率与编码序列进化速率呈极显著的负相关;内含子丢失速率也与序列进化速率呈显著的正相关。在内含子获得和丢失速率与基因表达水平之间观察到相反符号的相关性,尽管不太显著。有人提出,内含子进化包括一个中性成分,这体现在获得和丢失速率之间的正相关中,以及一个选择驱动的成分,这反映在内含子获得和丢失与序列进化之间的联系中。在进化上保守的基因中内含子获得增加和内含子丢失减少表明,内含子插入往往可能是适应性的,而一些内含子丢失可能是有害的。内含子这种明显的功能重要性可能至少部分归因于它们对基因表达的多种影响。