人类5'非翻译区序列的CART分类

CART classification of human 5' UTR sequences.

作者信息

Davuluri R V, Suzuki Y, Sugano S, Zhang M Q

机构信息

Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.

出版信息

Genome Res. 2000 Nov;10(11):1807-16. doi: 10.1101/gr.gr-1460r.

DOI:10.1101/gr.gr-1460r

PMID:11076865

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC310970/

Abstract

A nonredundant database of 2312 full-length human 5'-untranslated regions (UTRs) was carefully prepared using state-of-the-art experimental and computational technologies. A comprehensive computational analysis of this data was conducted for characterizing the 5' UTR features. Classification and regression tree (CART) analysis was used to classify the data into three distinct classes. Class I consists of mRNAs that are believed to be poorly translated with long 5' UTRs filled with potential inhibitory features. Class II consists of terminal oligopyrimidine tract (TOP) mRNAs that are regulated in a growth-dependent manner, and class III consists of mRNAs with favorable 5' UTR features that may help efficient translation. The most accurate tree we found has 92.5% classification accuracy as estimated by cross validation. The classification model included the presence of TOP, a secondary structure, 5' UTR length, and the presence of upstream AUGs (uAUGs) as the most relevant variables. The present classification and characterization of the 5' UTRs provide precious information for better understanding the translational regulation of human mRNAs. Furthermore, this database and classification can help people build better computational models for predicting the 5'-terminal exon and separating the 5' UTR from the coding region.

摘要

利用最先进的实验和计算技术，精心构建了一个包含2312个全长人类5'非翻译区（UTR）的非冗余数据库。对这些数据进行了全面的计算分析，以表征5'UTR的特征。使用分类与回归树（CART）分析将数据分为三个不同的类别。第一类由5'UTR较长且充满潜在抑制特征、翻译效率较低的mRNA组成。第二类由以生长依赖方式调控的末端寡嘧啶序列（TOP）mRNA组成，第三类由具有有利于高效翻译的5'UTR特征的mRNA组成。通过交叉验证估计，我们发现的最准确的树具有92.5%的分类准确率。分类模型将TOP的存在、二级结构、5'UTR长度以及上游AUG（uAUG）的存在作为最相关的变量。目前对5'UTR的分类和表征为更好地理解人类mRNA的翻译调控提供了宝贵信息。此外，该数据库和分类有助于人们构建更好的计算模型，用于预测5'末端外显子并将5'UTR与编码区区分开来。

相似文献

CART classification of human 5' UTR sequences.人类5'非翻译区序列的CART分类

Genome Res. 2000 Nov;10(11):1807-16. doi: 10.1101/gr.gr-1460r.

Presence of ATG triplets in 5' untranslated regions of eukaryotic cDNAs correlates with a 'weak' context of the start codon.真核生物cDNA 5'非翻译区中ATG三联体的存在与起始密码子的“弱”上下文相关。

Bioinformatics. 2001 Oct;17(10):890-900. doi: 10.1093/bioinformatics/17.10.890.

Bioinformatic analyses of mammalian 5'-UTR sequence properties of mRNAs predicts alternative translation initiation sites.对哺乳动物mRNA的5'-UTR序列特性进行生物信息学分析可预测替代性翻译起始位点。

BMC Bioinformatics. 2008 May 8;9:232. doi: 10.1186/1471-2105-9-232.

Deciphering the rules by which 5'-UTR sequences affect protein expression in yeast.解析影响酵母中 5'UTR 序列蛋白质表达的规则。

Proc Natl Acad Sci U S A. 2013 Jul 23;110(30):E2792-801. doi: 10.1073/pnas.1222534110. Epub 2013 Jul 5.

Complex translational regulation of BACE1 involves upstream AUGs and stimulatory elements within the 5' untranslated region.β-分泌酶1（BACE1）复杂的翻译调控涉及5'非翻译区内的上游AUG和刺激元件。

Nucleic Acids Res. 2007;35(9):2975-85. doi: 10.1093/nar/gkm191. Epub 2007 Apr 16.

Eukaryotic mRNAs encoding abundant and scarce proteins are statistically dissimilar in many structural features.

FEBS Lett. 1998 Dec 4;440(3):351-5. doi: 10.1016/s0014-5793(98)01482-3.

A computational and experimental approach reveals that the 5'-proximal region of the 5'-UTR has a Cis-regulatory signature responsible for heat stress-regulated mRNA translation in Arabidopsis.一种计算和实验方法揭示，5'UTR 的 5'近端区域具有顺式调控特征，负责调控拟南芥中热应激调节的 mRNA 翻译。

Plant Cell Physiol. 2013 Apr;54(4):474-83. doi: 10.1093/pcp/pcs189. Epub 2013 Jan 10.

Regulation of ribonucleotide reductase M2 expression by the upstream AUGs.上游AUG对核糖核苷酸还原酶M2表达的调控。

Nucleic Acids Res. 2005 May 11;33(8):2715-25. doi: 10.1093/nar/gki569. Print 2005.

Structure and function of a cap-independent translation element that functions in either the 3' or the 5' untranslated region.一种在3'或5'非翻译区发挥作用的不依赖帽结构的翻译元件的结构与功能。

RNA. 2000 Dec;6(12):1808-20. doi: 10.1017/s1355838200001539.

Cooperation between the chloroplast psbA 5'-untranslated region and coding region is important for translational initiation: the chloroplast translation machinery cannot read a human viral gene coding region.叶绿体psbA 5'非翻译区与编码区之间的合作对翻译起始很重要：叶绿体翻译机制无法读取人类病毒基因编码区。

Plant J. 2016 Mar;85(6):772-80. doi: 10.1111/tpj.13150.

引用本文的文献

Decoding the interactions and functions of non-coding RNA with artificial intelligence.利用人工智能解码非编码RNA的相互作用和功能。

Nat Rev Mol Cell Biol. 2025 Jun 19. doi: 10.1038/s41580-025-00857-w.

Analysis of the 5' Untranslated Region Length-Dependent Control of Gene Expression in Maize: A Case Study with the Gene Family.玉米 5'非翻译区长度依赖的基因表达调控分析：以基因家族为例。

Genes (Basel). 2024 Jul 29;15(8):994. doi: 10.3390/genes15080994.

Posttranscriptional regulation of the T-box gene midline via the 3'UTR in Drosophila is complex and cell- and tissue-dependent.果蝇 T 盒基因中线通过 3'UTR 的转录后调控是复杂的，且依赖于细胞和组织。

Genetics. 2024 Aug 7;227(4). doi: 10.1093/genetics/iyae087.

Complex CDKL5 translational regulation and its potential role in CDKL5 deficiency disorder.复杂的CDKL5翻译调控及其在CDKL5缺乏症中的潜在作用。

Front Cell Neurosci. 2023 Oct 30;17:1231493. doi: 10.3389/fncel.2023.1231493. eCollection 2023.

Investigating the NRAS 5' UTR as a target for small molecules.研究NRAS 5'UTR 作为小分子的靶标。

Cell Chem Biol. 2023 Jun 15;30(6):643-657.e8. doi: 10.1016/j.chembiol.2023.05.004. Epub 2023 May 30.

Identification of and splicing variants in 5' untranslated region with distinct expression profiles in brain tumor samples.在脑肿瘤样本中具有不同表达谱的5'非翻译区的鉴定及剪接变体

Front Oncol. 2023 Feb 13;13:1075638. doi: 10.3389/fonc.2023.1075638. eCollection 2023.

A 5' UTR Mutation Contributes to Down-Regulation of in the Berlin Fat Mouse.5'UTR 突变导致柏林肥胖小鼠中下调。

Int J Mol Sci. 2022 Oct 27;23(21):13018. doi: 10.3390/ijms232113018.

SFPQ promotes RAS-mutant cancer cell growth by modulating 5'-UTR mediated translational control of CK1α.SFPQ通过调节CK1α的5'-UTR介导的翻译控制来促进RAS突变癌细胞的生长。

NAR Cancer. 2022 Sep 27;4(3):zcac027. doi: 10.1093/narcan/zcac027. eCollection 2022 Sep.

Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome.将 RNAStructuromeDB 扩展到包括跨越人类蛋白质编码转录组的二级结构数据。

Sci Rep. 2022 Aug 25;12(1):14515. doi: 10.1038/s41598-022-18699-3.

Extensive Variation in Gene Expression is Revealed in 13 Fertility-Related Genes Using RNA-Seq, ISO-Seq, and CAGE-Seq From Brahman Cattle.利用婆罗门牛的RNA-Seq、ISO-Seq和CAGE-Seq技术，在13个与生育相关的基因中发现了广泛的基因表达变异。

Front Genet. 2022 Mar 25;13:784663. doi: 10.3389/fgene.2022.784663. eCollection 2022.

本文引用的文献

Statistical analysis of the 5' untranslated region of human mRNA using "Oligo-Capped" cDNA libraries.使用“寡聚帽”cDNA文库对人类mRNA的5'非翻译区进行统计分析。

Genomics. 2000 Mar 15;64(3):286-97. doi: 10.1006/geno.2000.6076.

UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs.UTRdb和UTRsite：真核生物mRNA 5'和3'非翻译区的序列和功能元件的专门数据库。

Nucleic Acids Res. 2000 Jan 1;28(1):193-6. doi: 10.1093/nar/28.1.193.

From factors to mechanisms: translation and translational control in eukaryotes.从因子到机制：真核生物中的翻译及翻译控制

Curr Opin Genet Dev. 1999 Oct;9(5):515-21. doi: 10.1016/s0959-437x(99)00005-2.

Prediction of eukaryotic mRNA translational properties.真核生物信使核糖核酸翻译特性的预测

Bioinformatics. 1999 Jul-Aug;15(7-8):704-12. doi: 10.1093/bioinformatics/15.7.704.

Messenger RNA translation state: the second dimension of high-throughput expression screening.信使核糖核酸翻译状态：高通量表达筛选的第二个维度。

Proc Natl Acad Sci U S A. 1999 Sep 14;96(19):10632-6. doi: 10.1073/pnas.96.19.10632.

Initiation of translation in prokaryotes and eukaryotes.原核生物和真核生物中的翻译起始。

Gene. 1999 Jul 8;234(2):187-208. doi: 10.1016/s0378-1119(99)00210-3.

Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure.热力学参数对序列依赖性的扩展改进了RNA二级结构的预测。

J Mol Biol. 1999 May 21;288(5):911-40. doi: 10.1006/jmbi.1999.2700.

The role of the 5' untranslated region of an mRNA in translation regulation during development.信使核糖核酸的5'非翻译区在发育过程中翻译调控中的作用。

Int J Biochem Cell Biol. 1999 Jan;31(1):87-106. doi: 10.1016/s1357-2725(98)00134-4.

Translational control: the cancer connection.翻译控制：与癌症的关联

Int J Biochem Cell Biol. 1999 Jan;31(1):1-23. doi: 10.1016/s1357-2725(98)00127-7.

Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis.秀丽隐杆线虫、果蝇和拟南芥中的表达模式，以及令人惊讶的是，基因长度塑造了密码子使用情况。

Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4482-7. doi: 10.1073/pnas.96.8.4482.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验