Neverov A D, Artamonova I I, Nurtdinov R N, Frishman D, Gelfand M S, Mironov A A
State Scientific Center GosNIIGenetika, 1st Dorozhny proezd 1, Moscow, 117545, Russia.
BMC Bioinformatics. 2005 Nov 7;6:266. doi: 10.1186/1471-2105-6-266.
Alternative splicing is a major mechanism of generating protein diversity in higher eukaryotes. Although at least half, and probably more, of mammalian genes are alternatively spliced, it was not clear, whether the frequency of alternative splicing is the same in different functional categories. The problem is obscured by uneven coverage of genes by ESTs and a large number of artifacts in the EST data.
We have developed a method that generates possible mRNA isoforms for human genes contained in the EDAS database, taking into account the effects of nonsense-mediated decay and translation initiation rules, and a procedure for offsetting the effects of uneven EST coverage. Then we computed the number of mRNA isoforms for genes from different functional categories. Genes encoding ribosomal proteins and genes in the category "Small GTPase-mediated signal transduction" tend to have fewer isoforms than the average, whereas the genes in the category "DNA replication and chromosome cycle" have more isoforms than the average. Genes encoding proteins involved in protein-protein interactions tend to be alternatively spliced more often than genes encoding non-interacting proteins, although there is no significant difference in the number of isoforms of alternatively spliced genes.
Filtering for functional isoforms satisfying biological constraints and accounting for uneven EST coverage allowed us to describe differences in alternative splicing of genes from different functional categories. The observations seem to be consistent with expectations based on current biological knowledge: less isoforms for ribosomal and signal transduction proteins, and more alternative splicing of interacting and cell cycle proteins.
可变剪接是高等真核生物中产生蛋白质多样性的主要机制。尽管哺乳动物中至少一半甚至可能更多的基因存在可变剪接,但尚不清楚可变剪接的频率在不同功能类别中是否相同。由于EST对基因的覆盖不均以及EST数据中存在大量伪像,该问题变得模糊不清。
我们开发了一种方法,该方法考虑无义介导的衰变和翻译起始规则,为EDAS数据库中包含的人类基因生成可能的mRNA异构体,并开发了一种抵消EST覆盖不均影响的程序。然后,我们计算了不同功能类别的基因的mRNA异构体数量。编码核糖体蛋白的基因和“小GTPase介导的信号转导”类别中的基因往往比平均水平具有更少的异构体,而“DNA复制和染色体循环”类别中的基因具有比平均水平更多的异构体。编码参与蛋白质-蛋白质相互作用的蛋白质的基因往往比编码非相互作用蛋白质的基因更频繁地发生可变剪接,尽管可变剪接基因的异构体数量没有显著差异。
对满足生物学限制的功能异构体进行筛选并考虑EST覆盖不均,使我们能够描述不同功能类别的基因在可变剪接方面的差异。这些观察结果似乎与基于当前生物学知识的预期一致:核糖体和信号转导蛋白的异构体较少,而相互作用和细胞周期蛋白的可变剪接较多。