人类前体信使核糖核酸中5'非翻译区内含子剪接位点的分析与识别

Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA.

作者信息

Eden E, Brunak S

机构信息

Center for Biological Sequence Analysis, Biocentrum-DTU Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark.

出版信息

Nucleic Acids Res. 2004 Feb 11;32(3):1131-42. doi: 10.1093/nar/gkh273. Print 2004.

DOI:10.1093/nar/gkh273

PMID:14960723

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC373407/

Abstract

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.

摘要

预测基因非编码区的剪接位点是基因结构识别中最具挑战性的方面之一。我们对嵌入人类5'非翻译区（UTR）的此类剪接位点进行了严格分析，并研究了这类剪接位点与相邻外显子和内含子中其他特征之间的相关性。通过将神经网络算法的训练限制在“纯”UTR（不部分延伸到蛋白质编码区），我们首次研究了剪接信号本身的预测能力，这与传统的剪接位点预测不同，传统方法通常依赖于从蛋白质编码到非编码转变时序列的变化。这样做，算法能够捕捉到原本被“编码”噪声掩盖的更微妙的剪接信号，从而显著提高了对5'UTR剪接位点的预测。例如，非编码剪接位点预测网络在非编码外显子的3'端和5'非编码内含子末端发现了组成和位置偏差，其中胞嘧啶和鸟嘌呤的含量过高。在训练用于识别UTR供体位点的神经网络的突触权重中，真正UTR供体位点的这种组成偏差也很明显。传统的剪接位点预测方法在UTR中表现不佳，因为不存在阅读框模式。这里提出的NetUTR方法在5'UTR中比NetGene2和GenScan表现好2至3倍。我们还在蛋白质编码区测试了5'UTR训练的方法，令人惊讶的是，它运行得相当好（尽管它无法与NetGene2竞争）。这表明UTR和编码区的局部剪接模式在很大程度上是相同的。NetUTR方法可在www.cbs.dtu.dk/services/NetUTR上公开获取。

相似文献

Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA.人类前体信使核糖核酸中5'非翻译区内含子剪接位点的分析与识别

Nucleic Acids Res. 2004 Feb 11;32(3):1131-42. doi: 10.1093/nar/gkh273. Print 2004.

Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information.通过结合局部和全局序列信息预测拟南芥前体mRNA中的剪接位点

Nucleic Acids Res. 1996 Sep 1;24(17):3439-52. doi: 10.1093/nar/24.17.3439.

A shared RNA-binding site in the Pet54 protein is required for translational activation and group I intron splicing in yeast mitochondria.酵母线粒体中翻译激活和I类内含子剪接需要Pet54蛋白中一个共享的RNA结合位点。

Nucleic Acids Res. 2008 May;36(9):2958-68. doi: 10.1093/nar/gkn045. Epub 2008 Apr 3.

Logitlinear models for the prediction of splice sites in plant pre-mRNA sequences.用于预测植物前体mRNA序列中剪接位点的对数线性模型。

Nucleic Acids Res. 1996 Dec 1;24(23):4709-18. doi: 10.1093/nar/24.23.4709.

Prediction of human mRNA donor and acceptor sites from the DNA sequence.从DNA序列预测人类mRNA供体位点和受体位点。

J Mol Biol. 1991 Jul 5;220(1):49-65. doi: 10.1016/0022-2836(91)90380-o.

Thermodynamic modeling of donor splice site recognition in pre-mRNA.前体信使核糖核酸中供体剪接位点识别的热力学建模

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Apr;69(4 Pt 1):041903. doi: 10.1103/PhysRevE.69.041903. Epub 2004 Apr 26.

Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions.具有弱供体位点的人类GC-AG可变内含子异构体在受体外显子位置表现出增强的共有序列。

Nucleic Acids Res. 2001 Jun 15;29(12):2581-93. doi: 10.1093/nar/29.12.2581.

The architecture of pre-mRNAs affects mechanisms of splice-site pairing.前体信使核糖核酸的结构影响剪接位点配对机制。

Proc Natl Acad Sci U S A. 2005 Nov 8;102(45):16176-81. doi: 10.1073/pnas.0508489102. Epub 2005 Oct 31.

The 5' leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region.植物磷酸葡萄糖异构酶（PgiC）的5'前导序列含有一个内含子：与编码区的内含子和外显子相比，该前导序列在限制因素上既表现出丧失又表现出保留。

Mol Biol Evol. 2002 Sep;19(9):1613-23. doi: 10.1093/oxfordjournals.molbev.a004223.

Computational discovery of human coding and non-coding transcripts with conserved splice sites.具有保守剪接位点的人类编码和非编码转录本的计算发现。

Bioinformatics. 2011 Jul 15;27(14):1894-900. doi: 10.1093/bioinformatics/btr314. Epub 2011 May 26.

引用本文的文献

J Hum Genet. 2025 Jun 11. doi: 10.1038/s10038-025-01354-w.

Discovery of Novel Isoforms in Hepatic and Intestinal Cell Models Using Long-Read RNA Sequencing.使用长读长RNA测序在肝和肠细胞模型中发现新型异构体

Genes (Basel). 2025 Mar 31;16(4):412. doi: 10.3390/genes16040412.

Difference Analysis Among Six Kinds of Acceptor Splicing Sequences by the Dispersion Features of 6-mer Subsets in Human Genes.基于人类基因中六聚体子集的离散特征对六种受体剪接序列的差异分析

Biology (Basel). 2025 Feb 15;14(2):206. doi: 10.3390/biology14020206.

Differences in 5'untranslated regions highlight the importance of translational regulation of dosage sensitive genes.5'非翻译区的差异突出了剂量敏感基因翻译调控的重要性。

Genome Biol. 2024 Apr 29;25(1):111. doi: 10.1186/s13059-024-03248-0.

Comprehensive Genetic Analysis Unraveled the Missing Heritability in a Chinese Cohort With Wolfram Syndrome 1: Clinical and Genetic Findings.全面的遗传分析揭示了中国 1 型 WOLFRAM 综合征患者遗传率缺失的原因：临床和遗传学发现。

Invest Ophthalmol Vis Sci. 2022 Sep 1;63(10):9. doi: 10.1167/iovs.63.10.9.

A versatile 5' RACE-Seq methodology for the accurate identification of the 5' termini of mRNAs.一种通用的 5' RACE-Seq 方法，用于准确鉴定 mRNAs 的 5' 末端。

BMC Genomics. 2022 Feb 26;23(1):163. doi: 10.1186/s12864-022-08386-y.

A splice-site variant in FLVCR1 produces retinitis pigmentosa without posterior column ataxia.FLVCR1中的一个剪接位点变异导致无后柱共济失调的色素性视网膜炎。

Ophthalmic Genet. 2018 Apr;39(2):263-267. doi: 10.1080/13816810.2017.1408848. Epub 2017 Dec 1.

Knockdown and replacement therapy mediated by artificial mirtrons in spinocerebellar ataxia 7.人工微小内含子介导的脊髓小脑共济失调7型的基因敲低和替代疗法

Nucleic Acids Res. 2017 Jul 27;45(13):7870-7885. doi: 10.1093/nar/gkx483.

Varying levels of complexity in transcription factor binding motifs.转录因子结合基序的复杂程度不同。

Nucleic Acids Res. 2015 Oct 15;43(18):e119. doi: 10.1093/nar/gkv577. Epub 2015 Jun 26.

Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.可视化基因编辑器：一款用于合成基因优化的全可编程生物信息学软件。

BMC Bioinformatics. 2011 Aug 16;12:340. doi: 10.1186/1471-2105-12-340.

本文引用的文献

Current methods of gene prediction, their strengths and weaknesses.当前的基因预测方法、其优势与不足。

Nucleic Acids Res. 2002 Oct 1;30(19):4103-17. doi: 10.1093/nar/gkf543.

Computational prediction of eukaryotic protein-coding genes.真核生物蛋白质编码基因的计算预测

Nat Rev Genet. 2002 Sep;3(9):698-709. doi: 10.1038/nrg890.

Control of eukaryotic protein synthesis by upstream open reading frames in the 5'-untranslated region of an mRNA.mRNA 5'非翻译区上游开放阅读框对真核生物蛋白质合成的调控。

Biochem J. 2002 Oct 1;367(Pt 1):1-11. doi: 10.1042/BJ20011706.

Computational identification of promoters and first exons in the human genome.人类基因组中启动子和首个外显子的计算识别

Nat Genet. 2001 Dec;29(4):412-7. doi: 10.1038/ng780.

Integrating genomic homology into gene structure prediction.将基因组同源性整合到基因结构预测中。

Bioinformatics. 2001;17 Suppl 1:S140-8. doi: 10.1093/bioinformatics/17.suppl_1.s140.

Nucleic Acids Res. 2001 Jun 15;29(12):2581-93. doi: 10.1093/nar/29.12.2581.

Gene structure prediction and alternative splicing analysis using genomically aligned ESTs.利用基因组比对的ESTs进行基因结构预测和可变剪接分析。

Genome Res. 2001 May;11(5):889-900. doi: 10.1101/gr.155001.

GeneSplicer: a new computational method for splice site prediction.基因剪接器：一种用于剪接位点预测的新计算方法。

Nucleic Acids Res. 2001 Mar 1;29(5):1185-90. doi: 10.1093/nar/29.5.1185.

CART classification of human 5' UTR sequences.人类5'非翻译区序列的CART分类

Genome Res. 2000 Nov;10(11):1807-16. doi: 10.1101/gr.gr-1460r.

Analysis of canonical and non-canonical splice sites in mammalian genomes.哺乳动物基因组中典型和非典型剪接位点的分析。

Nucleic Acids Res. 2000 Nov 1;28(21):4364-75. doi: 10.1093/nar/28.21.4364.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验