Ian Shi Ruian, Dalal Taykhoom, Fradkin Philip, Koyyalagunta Divya, Chhabria Simran, Jung Andrew, Tam Cyrus, Ceyhan Defne, Lin Jessica, Laverty Kaitlin U, Baali Ilyes, Wang Bo, Morris Quaid
Department of Computer Science, University of Toronto.
Vector Institute.
bioRxiv. 2025 Jul 8:2025.07.05.662870. doi: 10.1101/2025.07.05.662870.
Messenger RNA (mRNA) is central in gene expression, and its half-life, localization, and translation efficiency drive phenotypic diversity in eukaryotic cells. While supervised learning has widely been used to study the mRNA regulatory code, self-supervised foundation models support a wider range of transfer learning tasks. However, the dearth and homogeneity of standardized benchmarks limit efforts to pinpoint the strengths of various models. Here, we present mRNABench, a comprehensive benchmarking suite for mature mRNA biology that evaluates the representational quality of mature mRNA embeddings from self-supervised nucleotide foundation models. We curate ten datasets and 59 prediction tasks that broadly capture salient properties of mature mRNA, and assess the performance of 18 families of nucleotide foundation models for a total of 135K experiments. Using these experiments, we study parameter scaling, compositional generalization from learned biological features, and correlations between sequence compressibility and performance. We identify synergies between two self-supervised learning objectives, and pre-train a new Mamba-based model that achieves state-of-the-art performance using 700x fewer parameters. mRNABench can be found at: https://github.com/morrislab/mRNABench.
信使核糖核酸(mRNA)在基因表达中起核心作用,其半衰期、定位和翻译效率驱动真核细胞中的表型多样性。虽然监督学习已广泛用于研究mRNA调控密码,但自监督基础模型支持更广泛的迁移学习任务。然而,标准化基准的匮乏和同质性限制了确定各种模型优势的努力。在这里,我们展示了mRNABench,这是一个用于成熟mRNA生物学的综合基准测试套件,用于评估来自自监督核苷酸基础模型的成熟mRNA嵌入的表征质量。我们精心策划了十个数据集和59个预测任务,广泛捕捉成熟mRNA的显著特性,并评估了18个核苷酸基础模型家族在总共135K次实验中的性能。通过这些实验,我们研究了参数缩放、从学习到的生物学特征进行组合泛化,以及序列可压缩性与性能之间的相关性。我们确定了两个自监督学习目标之间的协同作用,并预训练了一个基于Mamba的新模型,该模型使用的参数减少了700倍,却实现了领先的性能。mRNABench可在以下网址找到:https://github.com/morrislab/mRNABench 。