Suppr超能文献

深度学习模型预测 mRNA 翻译的当前局限性。

Current limitations in predicting mRNA translation with deep learning models.

机构信息

Biozentrum, University of Basel, Spitalstrasse 41, 4056, Basel, Switzerland.

Departament de Bioquímica i Biologia Molecular and Institut de Biotecnologia i Biomedicina, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain.

出版信息

Genome Biol. 2024 Aug 20;25(1):227. doi: 10.1186/s13059-024-03369-6.

Abstract

BACKGROUND

The design of nucleotide sequences with defined properties is a long-standing problem in bioengineering. An important application is protein expression, be it in the context of research or the production of mRNA vaccines. The rate of protein synthesis depends on the 5' untranslated region (5'UTR) of the mRNAs, and recently, deep learning models were proposed to predict the translation output of mRNAs from the 5'UTR sequence. At the same time, large data sets of endogenous and reporter mRNA translation have become available.

RESULTS

In this study, we use complementary data obtained in two different cell types to assess the accuracy and generality of currently available models for predicting translational output. We find that while performing well on the data sets on which they were trained, deep learning models do not generalize well to other data sets, in particular of endogenous mRNAs, which differ in many properties from reporter constructs.

CONCLUSIONS

These differences limit the ability of deep learning models to uncover mechanisms of translation control and to predict the impact of genetic variation. We suggest directions that combine high-throughput measurements and machine learning to unravel mechanisms of translation control and improve construct design.

摘要

背景

具有特定性质的核苷酸序列的设计是生物工程中长期存在的问题。一个重要的应用是蛋白质表达,无论是在研究还是 mRNA 疫苗生产的背景下。蛋白质的合成速率取决于 mRNA 的 5'非翻译区(5'UTR),最近,提出了深度学习模型来预测从 5'UTR 序列的 mRNA 的翻译输出。与此同时,内源性和报告 mRNA 翻译的大型数据集也已可用。

结果

在这项研究中,我们使用在两种不同细胞类型中获得的互补数据来评估当前用于预测翻译输出的模型的准确性和通用性。我们发现,虽然在它们接受训练的数据集上表现良好,但深度学习模型不能很好地推广到其他数据集,特别是内源性 mRNAs,它们在许多特性上与报告构建体不同。

结论

这些差异限制了深度学习模型发现翻译控制机制和预测遗传变异影响的能力。我们提出了一些方向,将高通量测量和机器学习相结合,以揭示翻译控制的机制并改进构建体设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e16/11337900/e47f2820a00a/13059_2024_3369_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验