Bioinformatics Unit, Clinical Research Coordination, Instituto Nacional de Câncer (INCA), Rua André Cavalcanti, 37 - CEP 20231-050, Rio de Janeiro, RJ, Brazil.
Comput Biol Chem. 2012 Feb;36:55-61. doi: 10.1016/j.compbiolchem.2012.01.002. Epub 2012 Jan 12.
Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order.
内含子剪接是前体 mRNA 成熟过程中最重要的步骤之一。尽管剪接位点周围的序列特征已经得到了广泛研究,但供体位点前的外显子序列和受体位点前的内含子序列之间的序列同一性水平尚未得到充分研究。在这项研究中,我们研究了一组不表现内含子保留的人类蛋白质编码基因中,5'剪接位点前的最后 15 个外显子序列与 3'剪接位点前的内含子序列之间的同一性模式。我们发现,人类蛋白质编码基因中几乎 60%的连续外显子和内含子在 3'末端至少有两个相同的核苷酸,平均序列同一性长度为 2.47 个核苷酸。根据我们的发现,我们得出结论,与从不同基因中获取的外显子和内含子相比,基因内的外显子和内含子 3' 末端往往具有更长的相同序列。即使在转录顺序上这些外显子和内含子不是连续的,我们的结果仍然成立。