Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain.
Bioinformatics. 2012 Jun 15;28(12):i67-74. doi: 10.1093/bioinformatics/bts216.
Chimeric RNA transcripts are generated by different mechanisms including pre-mRNA trans-splicing, chromosomal translocations and/or gene fusions. It was shown recently that at least some of chimeric transcripts can be translated into functional chimeric proteins.
To gain a better understanding of the design principles underlying chimeric proteins, we have analyzed 7,424 chimeric RNAs from humans. We focused on the specific domains present in these proteins, comparing their permutations with those of known human proteins. Our method uses genomic alignments of the chimeras, identification of the gene-gene junction sites and prediction of the protein domains. We found that chimeras contain complete protein domains significantly more often than in random data sets. Specifically, we show that eight different types of domains are over-represented among all chimeras as well as in those chimeras confirmed by RNA-seq experiments. Moreover, we discovered that some chimeras potentially encode proteins with novel and unique domain combinations. Given the observed prevalence of entire protein domains in chimeras, we predict that certain putative chimeras that lack activation domains may actively compete with their parental proteins, thereby exerting dominant negative effects. More generally, the production of chimeric transcripts enables a combinatorial increase in the number of protein products available, which may disturb the function of parental genes and influence their protein-protein interaction network.
our scripts are available upon request.
嵌合 RNA 转录本是通过不同的机制产生的,包括前体 mRNA 反式剪接、染色体易位和/或基因融合。最近表明,至少一些嵌合转录本可以翻译成功能性嵌合蛋白。
为了更好地理解嵌合蛋白的设计原则,我们分析了来自人类的 7424 个嵌合 RNA。我们专注于这些蛋白质中存在的特定结构域,将它们的排列与已知的人类蛋白质进行比较。我们的方法使用嵌合体的基因组比对、基因-基因连接位点的识别和蛋白质结构域的预测。我们发现嵌合体能更频繁地包含完整的蛋白质结构域,而不是在随机数据集。具体来说,我们表明,在所有嵌合体以及通过 RNA-seq 实验证实的嵌合体中,有八种不同类型的结构域过表达。此外,我们发现一些嵌合体可能编码具有新颖和独特结构域组合的蛋白质。鉴于嵌合体能频繁地包含完整的蛋白质结构域,我们预测某些缺乏激活结构域的假定嵌合体可能会积极与它们的亲本蛋白竞争,从而产生显性负效应。更一般地说,嵌合转录本的产生能够组合增加可用的蛋白质产物数量,这可能会干扰亲本基因的功能,并影响它们的蛋白质-蛋白质相互作用网络。
我们的脚本可根据要求提供。