Down Thomas, Leong Bernard, Hubbard Tim J P
Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
BMC Bioinformatics. 2006 Sep 26;7:419. doi: 10.1186/1471-2105-7-419.
The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins.
This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence.
We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.
RNA转录本的剪接被认为部分受到外显子内嵌入序列的促进和调控。已知序列包括SR蛋白的结合位点,这些位点被认为介导了与5'和3'剪接位点结合的剪接因子之间的相互作用。识别更多的候选序列将是有用的,然而,通过计算识别它们很困难,因为外显子序列也受到其在蛋白质编码中的功能作用的限制。
该策略识别出了一组基序,包括几个先前报道的剪接增强子元件。尽管该模型仅在编码外显子上进行训练,但它能够从基因内序列中区分编码外显子和非编码外显子。
我们训练了一个计算模型,该模型能够检测编码外显子中的信号,这些信号似乎与序列编码蛋白质的主要功能无关。我们相信,这里检测到的许多基序代表了影响RNA剪接的先前未识别的蛋白质以及其他调控元件的结合位点。