• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于解码mRNA非翻译区及功能预测的5'非翻译区语言模型。

A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions.

作者信息

Chu Yanyi, Yu Dan, Li Yupeng, Huang Kaixuan, Shen Yue, Cong Le, Zhang Jason, Wang Mengdi

机构信息

Center for Statistics and Machine Learning and Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544, USA.

Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA.

出版信息

Nat Mach Intell. 2024 Apr;6(4):449-460. doi: 10.1038/s42256-024-00823-9. Epub 2024 Apr 5.

DOI:10.1038/s42256-024-00823-9
PMID:38855263
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11155392/
Abstract

The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best known benchmark by up to 5% for predicting the Mean Ribosome Loading, and by up to 8% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.

摘要

5'非翻译区(5' UTR)是mRNA分子起始处的一个调控区域,在调节翻译过程中起着关键作用,并影响蛋白质表达水平。语言模型已展示出其在解码蛋白质和基因组序列功能方面的有效性。在此,我们引入了一种用于5' UTR的语言模型,我们将其称为UTR-LM。UTR-LM在来自多个物种的内源性5' UTR上进行预训练,并通过包括二级结构和最小自由能在内的监督信息进一步增强。我们在各种下游任务中对UTR-LM进行了微调。该模型在预测平均核糖体负载方面比最知名的基准性能高出5%,在预测翻译效率和mRNA表达水平方面高出8%。该模型还适用于识别非翻译区内未注释的内部核糖体进入位点,与最佳基线相比,将AUPR从0.37提高到了0.52。此外,我们设计了一个包含211个具有高预测翻译效率值的新型5' UTR的文库,并通过湿实验室实验对它们进行了评估。实验结果证实,我们的顶级设计相对于为治疗优化的成熟5' UTR,蛋白质生产水平提高了32.5%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/86f93411690d/nihms-1998067-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/1cbd1a2e3da3/nihms-1998067-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/ccecd0128866/nihms-1998067-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/a38d48395eae/nihms-1998067-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/723a54a53af5/nihms-1998067-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/b4f391c95134/nihms-1998067-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/d396fc11c786/nihms-1998067-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/86f93411690d/nihms-1998067-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/1cbd1a2e3da3/nihms-1998067-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/ccecd0128866/nihms-1998067-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/a38d48395eae/nihms-1998067-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/723a54a53af5/nihms-1998067-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/b4f391c95134/nihms-1998067-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/d396fc11c786/nihms-1998067-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/29c4/11155392/86f93411690d/nihms-1998067-f0006.jpg

相似文献

1
A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions.一种用于解码mRNA非翻译区及功能预测的5'非翻译区语言模型。
Nat Mach Intell. 2024 Apr;6(4):449-460. doi: 10.1038/s42256-024-00823-9. Epub 2024 Apr 5.
2
UTR-Insight: integrating deep learning for efficient 5' UTR discovery and design.UTR洞察:整合深度学习以实现高效的5'非翻译区发现与设计
BMC Genomics. 2025 Feb 4;26(1):107. doi: 10.1186/s12864-025-11269-7.
3
Role of RNA Domain Structure and Orientation in the Coxsackievirus B3 Virulence Phenotype.RNA 结构域和取向在柯萨奇病毒 B3 毒力表型中的作用。
J Virol. 2023 May 31;97(5):e0044823. doi: 10.1128/jvi.00448-23. Epub 2023 Apr 19.
4
Internal-ribosome-entry-site functional activity of the 3'-untranslated region of the mRNA for the beta subunit of mitochondrial H+-ATP synthase.线粒体H⁺-ATP合酶β亚基mRNA的3'非翻译区的内部核糖体进入位点功能活性
Biochem J. 2000 Mar 15;346 Pt 3(Pt 3):849-55.
5
Selection of mRNA 5'-untranslated region sequence with high translation efficiency through ribosome display.通过核糖体展示技术筛选具有高翻译效率的mRNA 5'-非翻译区序列
Biochem Biophys Res Commun. 2008 Aug 15;373(1):48-52. doi: 10.1016/j.bbrc.2008.05.173. Epub 2008 Jun 9.
6
Specific RNA structures in the 5' untranslated region of the human cytomegalovirus major immediate early transcript are critical for efficient virus replication.人巨细胞病毒主要早期即刻转录物 5'非翻译区中的特定 RNA 结构对于病毒的有效复制至关重要。
mBio. 2024 Feb 14;15(2):e0262123. doi: 10.1128/mbio.02621-23. Epub 2024 Jan 2.
7
Regulated translation of heparan sulfate N-acetylglucosamine N-deacetylase/n-sulfotransferase isozymes by structured 5'-untranslated regions and internal ribosome entry sites.硫酸乙酰肝素N-乙酰葡糖胺N-脱乙酰酶/ N-磺基转移酶同工酶通过结构化的5'-非翻译区和内部核糖体进入位点进行调控翻译。
J Biol Chem. 2002 Aug 23;277(34):30699-706. doi: 10.1074/jbc.M111904200. Epub 2002 Jun 17.
8
Preferential translation mediated by Hsp81-3 5'-UTR during heat shock involves ribosome entry at the 5'-end rather than an internal site in Arabidopsis suspension cells.热激期间由热激蛋白81-3 5'-非翻译区介导的优先翻译涉及拟南芥悬浮细胞中核糖体在5'-端而非内部位点的进入。
J Biosci Bioeng. 2008 Jan;105(1):39-47. doi: 10.1263/jbb.105.39.
9
Co-operation of the 5' and 3' untranslated regions of ornithine decarboxylase mRNA and inhibitory role of its 3' untranslated region in regulating the translational efficiency of hybrid RNA species via cellular factor.鸟氨酸脱羧酶mRNA 5'和3'非翻译区的协同作用及其3'非翻译区通过细胞因子对杂交RNA种类翻译效率的抑制作用。
Biochem J. 1997 Sep 1;326 ( Pt 2)(Pt 2):361-7. doi: 10.1042/bj3260361.
10
The hepatitis C virus RNA 3'-untranslated region strongly enhances translation directed by the internal ribosome entry site.丙型肝炎病毒RNA 3'非翻译区强烈增强由内部核糖体进入位点介导的翻译。
J Virol. 2006 Dec;80(23):11579-88. doi: 10.1128/JVI.00675-06. Epub 2006 Sep 13.

引用本文的文献

1
Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine.创建一个变异效应图谱,以解析意义未明的变异并指导心血管医学。
Nat Rev Cardiol. 2025 Sep 1. doi: 10.1038/s41569-025-01201-7.
2
BBANsh: a deep learning architecture based on BERT and bilinear attention networks to identify potent shRNA.BBANsh:一种基于BERT和双线性注意力网络的深度学习架构,用于识别有效的短发夹RNA。
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf443.
3
mRNABench: A curated benchmark for mature mRNA property and function prediction.

本文引用的文献

1
Discovery of regulatory motifs in 5' untranslated regions using interpretable multi-task learning models.使用可解释的多任务学习模型发现 5' 非翻译区的调控基序。
Cell Syst. 2023 Dec 20;14(12):1103-1112.e6. doi: 10.1016/j.cels.2023.10.011. Epub 2023 Nov 27.
2
DeepCIP: A multimodal deep learning method for the prediction of internal ribosome entry sites of circRNAs.DeepCIP:一种用于预测环状RNA内部核糖体进入位点的多模态深度学习方法。
Comput Biol Med. 2023 Sep;164:107288. doi: 10.1016/j.compbiomed.2023.107288. Epub 2023 Aug 1.
3
Machine learning predicts translation initiation sites in neurologic diseases with nucleotide repeat expansions.
mRNABench:用于成熟mRNA特性和功能预测的精选基准。
bioRxiv. 2025 Jul 8:2025.07.05.662870. doi: 10.1101/2025.07.05.662870.
4
Evaluating the representational power of pre-trained DNA language models for regulatory genomics.评估预训练DNA语言模型在调控基因组学中的表征能力。
Genome Biol. 2025 Jul 14;26(1):203. doi: 10.1186/s13059-025-03674-8.
5
UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.UTRGAN:学习生成5'非翻译区序列以优化翻译效率和基因表达。
Bioinform Adv. 2025 Jun 10;5(1):vbaf134. doi: 10.1093/bioadv/vbaf134. eCollection 2025.
6
RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks.RiNALMo:通用RNA语言模型在结构预测任务上能很好地泛化。
Nat Commun. 2025 Jul 1;16(1):5671. doi: 10.1038/s41467-025-60872-5.
7
Decoding the interactions and functions of non-coding RNA with artificial intelligence.利用人工智能解码非编码RNA的相互作用和功能。
Nat Rev Mol Cell Biol. 2025 Jun 19. doi: 10.1038/s41580-025-00857-w.
8
mRNAdesigner: an integrated web server for optimizing mRNA design and protein translation in eukaryotes.mRNA设计器:用于优化真核生物中mRNA设计和蛋白质翻译的集成网络服务器。
Nucleic Acids Res. 2025 Jul 7;53(W1):W415-W426. doi: 10.1093/nar/gkaf410.
9
Biocontrol Effect of D7-8 on Potato Common Scab and Its Complete Genome Sequence Analysis.D7-8对马铃薯疮痂病的生防效果及其全基因组序列分析
Microorganisms. 2025 Mar 28;13(4):770. doi: 10.3390/microorganisms13040770.
10
Novel Artificial 5'UTR Increase Modified mRNA Translation When Injected into Mouse Heart.新型人工5'非翻译区注入小鼠心脏时可增强修饰mRNA的翻译
Pharmaceutics. 2025 Apr 8;17(4):490. doi: 10.3390/pharmaceutics17040490.
机器学习预测核苷酸重复扩展导致的神经疾病中的翻译起始位点。
PLoS One. 2022 Jun 1;17(6):e0256411. doi: 10.1371/journal.pone.0256411. eCollection 2022.
4
Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics.基于 mRNA 的治疗方法中,mRNA 结构、稳定性和翻译的组合优化。
Nat Commun. 2022 Mar 22;13(1):1536. doi: 10.1038/s41467-022-28776-w.
5
Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入
NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.
6
Ensembl 2022.Ensembl 2022.
Nucleic Acids Res. 2022 Jan 7;50(D1):D988-D995. doi: 10.1093/nar/gkab1049.
7
High-throughput 5' UTR engineering for enhanced protein production in non-viral gene therapies.高通量 5'UTR 工程提高非病毒基因治疗中的蛋白质生产。
Nat Commun. 2021 Jul 6;12(1):4138. doi: 10.1038/s41467-021-24436-7.
8
Predicting mean ribosome load for 5'UTR of any length using deep learning.使用深度学习预测任意长度 5'UTR 的平均核糖体负载。
PLoS Comput Biol. 2021 May 10;17(5):e1008982. doi: 10.1371/journal.pcbi.1008982. eCollection 2021 May.
9
Rfam 14: expanded coverage of metagenomic, viral and microRNA families.Rfam 14:扩展了对宏基因组、病毒和 miRNA 家族的覆盖范围。
Nucleic Acids Res. 2021 Jan 8;49(D1):D192-D200. doi: 10.1093/nar/gkaa1047.
10
Secondary structure of the SARS-CoV-2 5'-UTR.SARS-CoV-2 5'-UTR 的二级结构。
RNA Biol. 2021 Apr;18(4):447-456. doi: 10.1080/15476286.2020.1814556. Epub 2020 Sep 23.