Ruvinsky A, Eskesen S T, Eskesen F N, Hurst L D
Institute for Genetics and Bioinformatics, University of New England, Armidale 2351, NSW, Australia.
J Mol Evol. 2005 Jan;60(1):99-104. doi: 10.1007/s00239-004-0032-9.
More introns exist between codons (phase 0) than between the first and the second bases (phase 1) or between the second and the third base (phase 2) within the codon. Many explanations have been suggested for this excess of phase 0. It has, for example, been argued to reflect an ancient utility for introns in separating exons that code for separate protein modules. There may, however, be a simple, alternative explanation. Introns typically require, for correct splicing, particular nucleotides immediately 5' in exons (typically a G) and immediately 3' in the following exon (also often a G). Introns therefore tend to be found between particular nucleotide pairs (e.g., G|G pairs) in the coding sequence. If, owing to bias in usage of different codons, these pairs are especially common at phase 0, then intron phase biases may have a trivial explanation. Here we take codon usage frequencies for a variety of eukaryotes and use these to generate random sequences. We then ask about the phase of putative intron insertion sites. Importantly, in all simulated data sets intron phase distribution is biased in favor of phase 0. In many cases the bias is of the magnitude observed in real data and can be attributed to codon usage bias. It is also known that exons may carry either the same phase (symmetric) or different phases (asymmetric) at the opposite ends. We simulated a distribution of different types of exons using frequencies of introns observed in real genes assuming random combination of intron phases at the opposite sides of exons. Surprisingly the simulated pattern was quite similar to that observed. In the simulants we typically observe a prevalence of symmetric exons carrying phase 0 at both ends, which is common for eukaryotic genes. However, at least in some species, the extent of the bias in favor of symmetric (0,0) exons is not as great in simulants as in real genes. These results emphasize the need to construct a biologically relevant null model of successful intron insertion.
密码子之间(0相)存在的内含子比密码子内第一个和第二个碱基之间(1相)或第二个和第三个碱基之间(2相)更多。对于这种0相内含子过多的现象,人们提出了许多解释。例如,有人认为这反映了内含子在分离编码不同蛋白质模块的外显子方面的古老作用。然而,可能存在一个简单的替代解释。内含子通常需要在外显子紧邻5'端(通常是一个G)和下一个外显子紧邻3'端(通常也是一个G)有特定的核苷酸才能正确剪接。因此,内含子倾向于出现在编码序列中的特定核苷酸对之间(例如,G|G对)。如果由于不同密码子使用的偏差,这些对在0相特别常见,那么内含子相位偏差可能有一个简单的解释。在这里,我们获取了多种真核生物的密码子使用频率,并利用这些频率生成随机序列。然后我们研究假定的内含子插入位点的相位。重要的是,在所有模拟数据集中,内含子相位分布都偏向于0相。在许多情况下,这种偏差的程度与实际数据中观察到的相当,并且可以归因于密码子使用偏差。还已知外显子在相对的两端可能具有相同的相位(对称)或不同的相位(不对称)。我们使用在真实基因中观察到的内含子频率,假设外显子相对两侧的内含子相位随机组合,模拟了不同类型外显子的分布。令人惊讶的是,模拟模式与观察到的模式非常相似。在模拟中,我们通常观察到两端都携带0相的对称外显子占优势,这在真核基因中很常见。然而,至少在某些物种中,模拟中偏向对称(0,0)外显子的偏差程度不如真实基因中那么大。这些结果强调了构建一个与成功内含子插入相关的生物学有效零模型的必要性。