Suppr超能文献

人类前体信使核糖核酸中5'非翻译区内含子剪接位点的分析与识别

Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA.

作者信息

Eden E, Brunak S

机构信息

Center for Biological Sequence Analysis, Biocentrum-DTU Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark.

出版信息

Nucleic Acids Res. 2004 Feb 11;32(3):1131-42. doi: 10.1093/nar/gkh273. Print 2004.

Abstract

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.

摘要

预测基因非编码区的剪接位点是基因结构识别中最具挑战性的方面之一。我们对嵌入人类5'非翻译区(UTR)的此类剪接位点进行了严格分析,并研究了这类剪接位点与相邻外显子和内含子中其他特征之间的相关性。通过将神经网络算法的训练限制在“纯”UTR(不部分延伸到蛋白质编码区),我们首次研究了剪接信号本身的预测能力,这与传统的剪接位点预测不同,传统方法通常依赖于从蛋白质编码到非编码转变时序列的变化。这样做,算法能够捕捉到原本被“编码”噪声掩盖的更微妙的剪接信号,从而显著提高了对5'UTR剪接位点的预测。例如,非编码剪接位点预测网络在非编码外显子的3'端和5'非编码内含子末端发现了组成和位置偏差,其中胞嘧啶和鸟嘌呤的含量过高。在训练用于识别UTR供体位点的神经网络的突触权重中,真正UTR供体位点的这种组成偏差也很明显。传统的剪接位点预测方法在UTR中表现不佳,因为不存在阅读框模式。这里提出的NetUTR方法在5'UTR中比NetGene2和GenScan表现好2至3倍。我们还在蛋白质编码区测试了5'UTR训练的方法,令人惊讶的是,它运行得相当好(尽管它无法与NetGene2竞争)。这表明UTR和编码区的局部剪接模式在很大程度上是相同的。NetUTR方法可在www.cbs.dtu.dk/services/NetUTR上公开获取。

相似文献

6
Thermodynamic modeling of donor splice site recognition in pre-mRNA.前体信使核糖核酸中供体剪接位点识别的热力学建模
Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Apr;69(4 Pt 1):041903. doi: 10.1103/PhysRevE.69.041903. Epub 2004 Apr 26.
8
The architecture of pre-mRNAs affects mechanisms of splice-site pairing.前体信使核糖核酸的结构影响剪接位点配对机制。
Proc Natl Acad Sci U S A. 2005 Nov 8;102(45):16176-81. doi: 10.1073/pnas.0508489102. Epub 2005 Oct 31.

引用本文的文献

9
Varying levels of complexity in transcription factor binding motifs.转录因子结合基序的复杂程度不同。
Nucleic Acids Res. 2015 Oct 15;43(18):e119. doi: 10.1093/nar/gkv577. Epub 2015 Jun 26.

本文引用的文献

5
Integrating genomic homology into gene structure prediction.将基因组同源性整合到基因结构预测中。
Bioinformatics. 2001;17 Suppl 1:S140-8. doi: 10.1093/bioinformatics/17.suppl_1.s140.
9
CART classification of human 5' UTR sequences.人类5'非翻译区序列的CART分类
Genome Res. 2000 Nov;10(11):1807-16. doi: 10.1101/gr.gr-1460r.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验