Suppr超能文献

解析深度神经网络以预测翻译速率。

Interpreting deep neural networks for the prediction of translation rates.

机构信息

Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Hannoversche Straße 28, Berlin, 10115, Germany.

Department of Biology, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany.

出版信息

BMC Genomics. 2024 Nov 9;25(1):1061. doi: 10.1186/s12864-024-10925-8.

Abstract

BACKGROUND

The 5' untranslated region of mRNA strongly impacts the rate of translation initiation. A recent convolutional neural network (CNN) model accurately quantifies the relationship between massively parallel synthetic 5' untranslated regions (5'UTRs) and translation levels. However, the underlying biological features, which drive model predictions, remain elusive. Uncovering sequence determinants predictive of translation output may allow us to develop a more detailed understanding of translation regulation at the 5'UTR.

RESULTS

Applying model interpretation, we extract representations of regulatory logic from CNNs trained on synthetic and human 5'UTR reporter data. We reveal a complex interplay of regulatory sequence elements, such as initiation context and upstream open reading frames (uORFs) to influence model predictions. We show that models trained on synthetic data alone do not sufficiently explain translation regulation via the 5'UTR due to differences in the frequency of regulatory motifs compared to natural 5'UTRs.

CONCLUSIONS

Our study demonstrates the significance of model interpretation in understanding model behavior, properties of experimental data and ultimately mRNA translation. By combining synthetic and human 5'UTR reporter data, we develop a model (OptMRL) which better captures the characteristics of human translation regulation. This approach provides a general strategy for building more successful sequence-based models of gene regulation, as it combines global sampling of random sequences with the subspace of naturally occurring sequences. Ultimately, this will enhance our understanding of 5'UTR sequences in disease and our ability to engineer translation output.

摘要

背景

mRNA 的 5'非翻译区强烈影响翻译起始的速度。最近的卷积神经网络(CNN)模型可以准确地量化大规模平行合成 5'非翻译区(5'UTR)与翻译水平之间的关系。然而,驱动模型预测的潜在生物学特征仍然难以捉摸。揭示预测翻译输出的序列决定因素,可能使我们能够更深入地了解 5'UTR 的翻译调控。

结果

通过模型解释,我们从基于合成和人类 5'UTR 报告基因数据训练的 CNN 中提取出调控逻辑的表示。我们揭示了调控序列元件(如起始上下文和上游开放阅读框(uORF))之间的复杂相互作用,以影响模型预测。我们表明,由于与天然 5'UTR 相比,合成数据中调控基序的频率存在差异,仅基于合成数据训练的模型并不能充分解释通过 5'UTR 的翻译调控。

结论

我们的研究表明,模型解释在理解模型行为、实验数据特性以及最终理解 mRNA 翻译方面具有重要意义。通过结合合成和人类 5'UTR 报告基因数据,我们开发了一种模型(OptMRL),该模型更好地捕捉了人类翻译调控的特征。这种方法为构建更成功的基于序列的基因调控模型提供了一种通用策略,因为它结合了随机序列的全局采样和自然发生序列的子空间。最终,这将增强我们对 5'UTR 序列在疾病中的理解,并提高我们对翻译输出进行工程设计的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ac/11549864/7a5492cc6604/12864_2024_10925_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验