ARCADE：通过激活工程从基础模型进行可控密码子设计

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.

作者信息

Li Jiayi, Liang Litian, Du Shiyi, Tang Shijie, Lai Hong-Sheng, Kingsford Carl

机构信息

Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15217, US.

出版信息

bioRxiv. 2025 Aug 23:2025.08.19.668819. doi: 10.1101/2025.08.19.668819.

DOI:10.1101/2025.08.19.668819

PMID:40894773

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393422/

Abstract

Codon sequence design is crucial for generating mRNA sequences with desired functional properties for tasks such as creating novel mRNA vaccines or gene editing therapies. Yet existing methods lack flexibility and controllability to adapt to various design objectives. We propose a novel framework, ARCADE, that enables flexible control over generated codon sequences. ARCADE is based on activation engineering and leverages inherent knowledge from pretrained genomic foundation models. Our approach extends activation engineering techniques beyond discrete feature manipulation to continuous biological metrics. Specifically, we define biologically meaningful semantic steering vectors in the model's activation space, which directly modulate continuous-valued properties such as the codon adaptation index, minimum free energy, and GC content without retraining. Experimental results demonstrate the superior performance and far greater flexibility of ARCADE compared to existing codon optimization approaches, underscoring its potential for advancing programmable biological sequence design.

摘要

密码子序列设计对于生成具有所需功能特性的mRNA序列至关重要，这些序列可用于诸如创建新型mRNA疫苗或基因编辑疗法等任务。然而，现有方法缺乏灵活性和可控性，无法适应各种设计目标。我们提出了一种新颖的框架ARCADE，它能够对生成的密码子序列进行灵活控制。ARCADE基于激活工程，并利用预训练基因组基础模型的固有知识。我们的方法将激活工程技术从离散特征操作扩展到连续生物学指标。具体而言，我们在模型的激活空间中定义具有生物学意义的语义引导向量，该向量可直接调节连续值属性，如密码子适应指数、最小自由能和GC含量，而无需重新训练。实验结果表明，与现有的密码子优化方法相比，ARCADE具有卓越的性能和更大的灵活性，突出了其在推进可编程生物序列设计方面的潜力。

相似文献

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.ARCADE：通过激活工程从基础模型进行可控密码子设计

bioRxiv. 2025 Aug 23:2025.08.19.668819. doi: 10.1101/2025.08.19.668819.

Prescription of Controlled Substances: Benefits and Risks管制药品的处方：益处与风险

Short-Term Memory Impairment短期记忆障碍

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.iACP-DPNet：一种用于可解释抗癌肽识别的双池因果扩张卷积网络。

Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x.

Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用：对临床预测模型的影响。

J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.

Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗？

Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.

HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer.心脏：使用异构关系感知转换器学习更好的 EHR 数据表示。

J Biomed Inform. 2024 Nov;159:104741. doi: 10.1016/j.jbi.2024.104741. Epub 2024 Oct 29.

Does Augmenting Irradiated Autografts With Free Vascularized Fibula Graft in Patients With Bone Loss From a Malignant Tumor Achieve Union, Function, and Complication Rate Comparably to Patients Without Bone Loss and Augmentation When Reconstructing Intercalary Resections in the Lower Extremity?对于因恶性肿瘤导致骨缺损的患者，在重建下肢节段性切除时，采用带血管游离腓骨移植来增强照射后的自体骨移植，其骨愈合、功能及并发症发生率与无骨缺损且未进行增强的患者相比是否相当？

Clin Orthop Relat Res. 2025 Jun 26. doi: 10.1097/CORR.0000000000003599.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

本文引用的文献

Genomic language models: opportunities and challenges.基因组语言模型：机遇与挑战。

Trends Genet. 2025 Apr;41(4):286-302. doi: 10.1016/j.tig.2024.11.013. Epub 2025 Jan 2.

Nucleotide Transformer: building and evaluating robust foundation models for human genomics.核苷酸变换器：构建和评估用于人类基因组学的强大基础模型。

Nat Methods. 2025 Feb;22(2):287-297. doi: 10.1038/s41592-024-02523-z. Epub 2024 Nov 28.

Sequence modeling and design from molecular to genome scale with Evo.基于 Evo 在从分子到基因组尺度上进行序列建模和设计。

Science. 2024 Nov 15;386(6723):eado9336. doi: 10.1126/science.ado9336.

CodonBERT large language model for mRNA vaccines.基于 CodonBERT 的 mRNA 疫苗大语言模型。

Genome Res. 2024 Aug 20;34(7):1027-1035. doi: 10.1101/gr.278870.123.

CodonBERT: a BERT-based architecture tailored for codon optimization using the cross-attention mechanism.CodonBERT：一种基于 BERT 的架构，使用交叉注意力机制专门针对密码子优化进行了优化。

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae330.

CpG dinucleotide enrichment in the influenza A virus genome as a live attenuated vaccine development strategy.流感 A 病毒基因组中 CpG 二核苷酸富集作为活减毒疫苗开发策略。

PLoS Pathog. 2023 May 5;19(5):e1011357. doi: 10.1371/journal.ppat.1011357. eCollection 2023 May.

Algorithm for optimized mRNA design improves stability and immunogenicity.优化 mRNA 设计的算法可提高稳定性和免疫原性。

Nature. 2023 Sep;621(7978):396-403. doi: 10.1038/s41586-023-06127-z. Epub 2023 May 2.

Revealing determinants of translation efficiency via whole-gene codon randomization and machine learning.通过全基因密码子随机化和机器学习揭示翻译效率的决定因素。

Nucleic Acids Res. 2023 Mar 21;51(5):2363-2376. doi: 10.1093/nar/gkad035.

The Codon Statistics Database: A Database of Codon Usage Bias.密码子统计数据库：一个密码子使用偏性数据库。

Mol Biol Evol. 2022 Aug 3;39(8). doi: 10.1093/molbev/msac157.

Informative RNA base embedding for RNA structural alignment and clustering by deep representation learning.通过深度表示学习进行RNA结构比对和聚类的信息性RNA碱基嵌入

NAR Genom Bioinform. 2022 Feb 22;4(1):lqac012. doi: 10.1093/nargab/lqac012. eCollection 2022 Mar.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

ARCADE：通过激活工程从基础模型进行可控密码子设计

ARCADE: Controllable Codon Design from Foundation Models via Activation Engineering.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献