• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

mRNABench:用于成熟mRNA特性和功能预测的精选基准。

mRNABench: A curated benchmark for mature mRNA property and function prediction.

作者信息

Ian Shi Ruian, Dalal Taykhoom, Fradkin Philip, Koyyalagunta Divya, Chhabria Simran, Jung Andrew, Tam Cyrus, Ceyhan Defne, Lin Jessica, Laverty Kaitlin U, Baali Ilyes, Wang Bo, Morris Quaid

机构信息

Department of Computer Science, University of Toronto.

Vector Institute.

出版信息

bioRxiv. 2025 Jul 8:2025.07.05.662870. doi: 10.1101/2025.07.05.662870.

DOI:10.1101/2025.07.05.662870
PMID:40672173
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12265608/
Abstract

Messenger RNA (mRNA) is central in gene expression, and its half-life, localization, and translation efficiency drive phenotypic diversity in eukaryotic cells. While supervised learning has widely been used to study the mRNA regulatory code, self-supervised foundation models support a wider range of transfer learning tasks. However, the dearth and homogeneity of standardized benchmarks limit efforts to pinpoint the strengths of various models. Here, we present mRNABench, a comprehensive benchmarking suite for mature mRNA biology that evaluates the representational quality of mature mRNA embeddings from self-supervised nucleotide foundation models. We curate ten datasets and 59 prediction tasks that broadly capture salient properties of mature mRNA, and assess the performance of 18 families of nucleotide foundation models for a total of 135K experiments. Using these experiments, we study parameter scaling, compositional generalization from learned biological features, and correlations between sequence compressibility and performance. We identify synergies between two self-supervised learning objectives, and pre-train a new Mamba-based model that achieves state-of-the-art performance using 700x fewer parameters. mRNABench can be found at: https://github.com/morrislab/mRNABench.

摘要

信使核糖核酸(mRNA)在基因表达中起核心作用,其半衰期、定位和翻译效率驱动真核细胞中的表型多样性。虽然监督学习已广泛用于研究mRNA调控密码,但自监督基础模型支持更广泛的迁移学习任务。然而,标准化基准的匮乏和同质性限制了确定各种模型优势的努力。在这里,我们展示了mRNABench,这是一个用于成熟mRNA生物学的综合基准测试套件,用于评估来自自监督核苷酸基础模型的成熟mRNA嵌入的表征质量。我们精心策划了十个数据集和59个预测任务,广泛捕捉成熟mRNA的显著特性,并评估了18个核苷酸基础模型家族在总共135K次实验中的性能。通过这些实验,我们研究了参数缩放、从学习到的生物学特征进行组合泛化,以及序列可压缩性与性能之间的相关性。我们确定了两个自监督学习目标之间的协同作用,并预训练了一个基于Mamba的新模型,该模型使用的参数减少了700倍,却实现了领先的性能。mRNABench可在以下网址找到:https://github.com/morrislab/mRNABench 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/c942ee903729/nihpp-2025.07.05.662870v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/f9285f86d0a6/nihpp-2025.07.05.662870v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/65fc4064765b/nihpp-2025.07.05.662870v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/049f0766d634/nihpp-2025.07.05.662870v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/94a710b48c1f/nihpp-2025.07.05.662870v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/0aec7b374fef/nihpp-2025.07.05.662870v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/256467d23b0a/nihpp-2025.07.05.662870v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/4bed377c4ca3/nihpp-2025.07.05.662870v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/bc102cd7ef98/nihpp-2025.07.05.662870v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/0a4509d80b0a/nihpp-2025.07.05.662870v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/c942ee903729/nihpp-2025.07.05.662870v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/f9285f86d0a6/nihpp-2025.07.05.662870v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/65fc4064765b/nihpp-2025.07.05.662870v1-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/049f0766d634/nihpp-2025.07.05.662870v1-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/94a710b48c1f/nihpp-2025.07.05.662870v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/0aec7b374fef/nihpp-2025.07.05.662870v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/256467d23b0a/nihpp-2025.07.05.662870v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/4bed377c4ca3/nihpp-2025.07.05.662870v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/bc102cd7ef98/nihpp-2025.07.05.662870v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/0a4509d80b0a/nihpp-2025.07.05.662870v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13ee/12265608/c942ee903729/nihpp-2025.07.05.662870v1-f0007.jpg

相似文献

1
mRNABench: A curated benchmark for mature mRNA property and function prediction.mRNABench:用于成熟mRNA特性和功能预测的精选基准。
bioRxiv. 2025 Jul 8:2025.07.05.662870. doi: 10.1101/2025.07.05.662870.
2
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
3
Selective State Space Models Outperform Transformers at Predicting RNA-Seq Read Coverage.在预测RNA测序读段覆盖度方面,选择性状态空间模型优于Transformer模型。
bioRxiv. 2025 Feb 17:2025.02.13.638190. doi: 10.1101/2025.02.13.638190.
4
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
5
A segment anything model-guided and match-based semi-supervised segmentation framework for medical imaging.一种用于医学成像的基于段式分割模型引导和匹配的半监督分割框架。
Med Phys. 2025 Mar 29. doi: 10.1002/mp.17785.
6
Short-Term Memory Impairment短期记忆障碍
7
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
8
Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology.通过同源性预测亲和力(PATH):基于持久同源性的可解释结合亲和力预测
bioRxiv. 2024 Oct 21:2023.11.16.567384. doi: 10.1101/2023.11.16.567384.
9
Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。
Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.
10
Surgery for epilepsy.癫痫手术
Cochrane Database Syst Rev. 2015 Jul 1(7):CD010541. doi: 10.1002/14651858.CD010541.pub2.

本文引用的文献

1
Predicting the translation efficiency of messenger RNA in mammalian cells.预测哺乳动物细胞中信使核糖核酸的翻译效率。
Nat Biotechnol. 2025 Jul 25. doi: 10.1038/s41587-025-02712-x.
2
DNABERT-S: pioneering species differentiation with species-aware DNA embeddings.DNABERT-S:利用物种感知DNA嵌入技术实现开创性的物种分化
Bioinformatics. 2025 Jul 1;41(Supplement_1):i255-i264. doi: 10.1093/bioinformatics/btaf188.
3
RiNALMo: general-purpose RNA language models can generalize well on structure prediction tasks.RiNALMo:通用RNA语言模型在结构预测任务上能很好地泛化。
Nat Commun. 2025 Jul 1;16(1):5671. doi: 10.1038/s41467-025-60872-5.
4
RNA neoantigen vaccines prime long-lived CD8 T cells in pancreatic cancer.RNA新抗原疫苗可在胰腺癌中激发长寿的CD8 T细胞。
Nature. 2025 Mar;639(8056):1042-1051. doi: 10.1038/s41586-024-08508-4. Epub 2025 Feb 19.
5
mRNA-LM: full-length integrated SLM for mRNA analysis.mRNA-LM:用于mRNA分析的全长整合型单分子定位显微镜
Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf044.
6
Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation.将DNA序列预测RNA测序覆盖度作为基因调控的统一模型。
Nat Genet. 2025 Apr;57(4):949-961. doi: 10.1038/s41588-024-02053-6. Epub 2025 Jan 8.
7
Ensembl 2025.Ensembl 2025。
Nucleic Acids Res. 2025 Jan 6;53(D1):D948-D957. doi: 10.1093/nar/gkae1071.
8
Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器:构建和评估用于人类基因组学的强大基础模型。
Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.
9
GENCODE 2025: reference gene annotation for human and mouse.GENCODE 2025:人类和小鼠的参考基因注释
Nucleic Acids Res. 2025 Jan 6;53(D1):D966-D975. doi: 10.1093/nar/gkae1078.
10
Sequence modeling and design from molecular to genome scale with Evo.基于 Evo 在从分子到基因组尺度上进行序列建模和设计。
Science. 2024 Nov 15;386(6723):eado9336. doi: 10.1126/science.ado9336.