Sohail Muhammad, Cao Wenguang, Mahmood Niaz, Myschyshyn Mike, Hong Say Pham, Xie Jiuyong
Department of Physiology, University of Manitoba, 440 BMSB, 745 Bannatyne Avenue, Winnipeg, MB R3E 0J9, Canada.
BMC Genomics. 2014 Dec 19;15(1):1143. doi: 10.1186/1471-2164-15-1143.
The 3' splice site (SS) at the end of pre-mRNA introns has a consensus sequence (Y)nNYAG for constitutive splicing of mammalian genes. Deviation from this consensus could change or interrupt the usage of the splice site leading to alternative or aberrant splicing, which could affect normal cell function or even the development of diseases. We have shown that the position "N" can be replaced by a CA-rich RNA element called CaRRE1 to regulate the alternative splicing of a group of genes.
Taking it a step further, we searched the human genome for purine-rich elements between the -3 and -10 positions of the 3' splice sites of annotated introns. This identified several thousand such 3'SS; more than a thousand of them contain at least one copy of G tract. These sites deviate significantly from the consensus of constitutive splice sites and are highly associated with alterative splicing events, particularly alternative 3' splice and intron retention. We show by mutagenesis analysis and RNA interference that the G tracts are splicing silencers and a group of the associated exons are controlled by the G tract binding proteins hnRNP H/F. Species comparison of a group of the 3'SS among vertebrates suggests that most (~87%) of the G tracts emerged in ancestors of mammals during evolution. Moreover, the host genes are most significantly associated with cancer.
We call these elements together with CaRRE1 regulatory RNA elements between the Py and 3'AG (REPA). The emergence of REPA in this highly constrained region indicates that this location has been remarkably permissive for the emergence of de novo regulatory RNA elements, even purine-rich motifs, in a large group of mammalian genes during evolution. This evolutionary change controls alternative splicing, likely to diversify proteomes for particular cellular functions.
前体mRNA内含子末端的3'剪接位点(SS)具有用于哺乳动物基因组成型剪接的共有序列(Y)nNYAG。与该共有序列的偏差可能会改变或中断剪接位点的使用,导致可变剪接或异常剪接,这可能会影响正常细胞功能甚至疾病的发展。我们已经表明,位置“N”可以被一个名为CaRRE1的富含CA的RNA元件取代,以调节一组基因的可变剪接。
进一步而言,我们在人类基因组中搜索注释内含子3'剪接位点-3至-10位置之间富含嘌呤的元件。这鉴定出数千个这样的3'SS;其中一千多个至少包含一个G序列拷贝。这些位点与组成型剪接位点的共有序列有显著偏差,并且与可变剪接事件高度相关,特别是可变3'剪接和内含子保留。我们通过诱变分析和RNA干扰表明,G序列是剪接沉默子,并且一组相关外显子受G序列结合蛋白hnRNP H/F控制。对一组脊椎动物中的3'SS进行物种比较表明,大多数(约87%)G序列在进化过程中出现在哺乳动物祖先中。此外,宿主基因与癌症的关联最为显著。
我们将这些元件与CaRRE1一起称为Py和3'AG之间的调控RNA元件(REPA)。REPA在这个高度受限区域的出现表明,在进化过程中,这个位置对于一大组哺乳动物基因中从头出现的调控RNA元件甚至富含嘌呤的基序非常宽容。这种进化变化控制可变剪接,可能使蛋白质组多样化以实现特定的细胞功能。