Suppr超能文献

一种用于识别人类蛋白质编码序列中候选结合位点的机器学习策略。

A machine learning strategy to identify candidate binding sites in human protein-coding sequence.

作者信息

Down Thomas, Leong Bernard, Hubbard Tim J P

机构信息

Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.

出版信息

BMC Bioinformatics. 2006 Sep 26;7:419. doi: 10.1186/1471-2105-7-419.

Abstract

BACKGROUND

The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins.

RESULTS

This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence.

CONCLUSION

We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.

摘要

背景

RNA转录本的剪接被认为部分受到外显子内嵌入序列的促进和调控。已知序列包括SR蛋白的结合位点,这些位点被认为介导了与5'和3'剪接位点结合的剪接因子之间的相互作用。识别更多的候选序列将是有用的,然而,通过计算识别它们很困难,因为外显子序列也受到其在蛋白质编码中的功能作用的限制。

结果

该策略识别出了一组基序,包括几个先前报道的剪接增强子元件。尽管该模型仅在编码外显子上进行训练,但它能够从基因内序列中区分编码外显子和非编码外显子。

结论

我们训练了一个计算模型,该模型能够检测编码外显子中的信号,这些信号似乎与序列编码蛋白质的主要功能无关。我们相信,这里检测到的许多基序代表了影响RNA剪接的先前未识别的蛋白质以及其他调控元件的结合位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6516/1592515/ec885f5d4004/1471-2105-7-419-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验