Li Jiayi, Liang Litian, Du Shiyi, Tang Shijie, Lai Hong-Sheng, Kingsford Carl
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15217, US.
bioRxiv. 2025 Aug 23:2025.08.19.668819. doi: 10.1101/2025.08.19.668819.
Codon sequence design is crucial for generating mRNA sequences with desired functional properties for tasks such as creating novel mRNA vaccines or gene editing therapies. Yet existing methods lack flexibility and controllability to adapt to various design objectives. We propose a novel framework, ARCADE, that enables flexible control over generated codon sequences. ARCADE is based on activation engineering and leverages inherent knowledge from pretrained genomic foundation models. Our approach extends activation engineering techniques beyond discrete feature manipulation to continuous biological metrics. Specifically, we define biologically meaningful semantic steering vectors in the model's activation space, which directly modulate continuous-valued properties such as the codon adaptation index, minimum free energy, and GC content without retraining. Experimental results demonstrate the superior performance and far greater flexibility of ARCADE compared to existing codon optimization approaches, underscoring its potential for advancing programmable biological sequence design.
密码子序列设计对于生成具有所需功能特性的mRNA序列至关重要,这些序列可用于诸如创建新型mRNA疫苗或基因编辑疗法等任务。然而,现有方法缺乏灵活性和可控性,无法适应各种设计目标。我们提出了一种新颖的框架ARCADE,它能够对生成的密码子序列进行灵活控制。ARCADE基于激活工程,并利用预训练基因组基础模型的固有知识。我们的方法将激活工程技术从离散特征操作扩展到连续生物学指标。具体而言,我们在模型的激活空间中定义具有生物学意义的语义引导向量,该向量可直接调节连续值属性,如密码子适应指数、最小自由能和GC含量,而无需重新训练。实验结果表明,与现有的密码子优化方法相比,ARCADE具有卓越的性能和更大的灵活性,突出了其在推进可编程生物序列设计方面的潜力。