Riley Aidan T, Vlasity McKayla, Huang Joey Zhuoying, Becicka Wyatt M, Wong Wilson W, Grinstaff Mark W, Green Alexander A
Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
Biological Design Center, Boston University, Boston, MA 02215, USA.
bioRxiv. 2025 Jun 17:2025.06.17.659751. doi: 10.1101/2025.06.17.659751.
End-to-end, machine-learning based design of mRNA molecules offers a powerful means to tailor their properties for specific tasks. mRNA expression level, immunogenicity, tissue specificity, stability, and localization, among others strongly depend on sequence, thus providing a rich set of properties amenable to optimization. Despite this potential, the various components of mRNA are governed by distinct grammatical and functional rules that hinder the development of a unified algorithmic approach for complete mRNA design. While machine learning and generative AI techniques demonstrate substantial benefits for individual sequence design tasks, adapting these tools to new domains or out-of-distribution tasks remains challenging. In this work, we describe a simple and powerful alteration to integrated gradients (esign by ntegrated radients- ) that serves as the foundation for several mRNA design tasks. Using this technique, we demonstrate complete model-informed mRNA design and reveal the underexplored rules governing the assembly of mRNA components into complete, high-performance transcripts. This framework closes the loop on mRNA design by unifying the strengths of predictive neural networks with the vast quantities of diverse RNA data that has been accumulating for decades across transcriptomic profiling, CLIP-seq, and individual experimental findings. By linking the benefits of neural network-based design with diverse alternative datasets, we demonstrate the design of complete mRNA sequences in out-of-distribution settings that exhibit dramatically improved translational capacity, enhanced cell-type-specific expression, and improved stability.
基于机器学习的mRNA分子端到端设计提供了一种强大的方法,可针对特定任务调整其特性。mRNA的表达水平、免疫原性、组织特异性、稳定性和定位等很大程度上取决于序列,从而提供了一组丰富的适合优化的特性。尽管有这种潜力,但mRNA的各个组成部分受不同的语法和功能规则支配,这阻碍了用于完整mRNA设计的统一算法方法的开发。虽然机器学习和生成式人工智能技术在单个序列设计任务中显示出显著优势,但将这些工具应用于新领域或分布外任务仍然具有挑战性。在这项工作中,我们描述了一种对集成梯度的简单而强大的改进(通过集成梯度进行设计 - ),它是几个mRNA设计任务的基础。使用这种技术,我们展示了完全基于模型的mRNA设计,并揭示了将mRNA组件组装成完整的高性能转录本的尚未充分探索的规则。该框架通过将预测神经网络的优势与数十年来在转录组分析、CLIP-seq和个体实验结果中积累的大量多样的RNA数据相结合,闭合了mRNA设计的循环。通过将基于神经网络设计的优势与多样的替代数据集联系起来,我们展示了在分布外设置中完整mRNA序列的设计,这些序列表现出显著提高的翻译能力、增强的细胞类型特异性表达和改善的稳定性。