Li Jiayi, Liang Litian, Du Shiyi, Tang Shijie, Lai Hong-Sheng, Kingsford Carl
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15217, US.
bioRxiv. 2025 Aug 23:2025.08.19.668819. doi: 10.1101/2025.08.19.668819.
Codon sequence design is crucial for generating mRNA sequences with desired functional properties for tasks such as creating novel mRNA vaccines or gene editing therapies. Yet existing methods lack flexibility and controllability to adapt to various design objectives. We propose a novel framework, ARCADE, that enables flexible control over generated codon sequences. ARCADE is based on activation engineering and leverages inherent knowledge from pretrained genomic foundation models. Our approach extends activation engineering techniques beyond discrete feature manipulation to continuous biological metrics. Specifically, we define biologically meaningful semantic steering vectors in the model's activation space, which directly modulate continuous-valued properties such as the codon adaptation index, minimum free energy, and GC content without retraining. Experimental results demonstrate the superior performance and far greater flexibility of ARCADE compared to existing codon optimization approaches, underscoring its potential for advancing programmable biological sequence design.
2025-1
Funct Integr Genomics. 2025-7-4
J Clin Epidemiol. 2025-8
Clin Orthop Relat Res. 2024-9-1
Cochrane Database Syst Rev. 2022-5-20
Trends Genet. 2025-4
Science. 2024-11-15
Genome Res. 2024-8-20
Nucleic Acids Res. 2023-3-21
Mol Biol Evol. 2022-8-3
NAR Genom Bioinform. 2022-2-22