Dai Ning, Zhou Tianshuo, Tang Wei Yu, Mathews David H, Huang Liang
School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97330, United States.
Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States.
Bioinformatics. 2025 Jul 1;41(Supplement_1):i391-i400. doi: 10.1093/bioinformatics/btaf245.
The task of designing optimized messenger RNA (mRNA) sequences has received much attention in recent years, thanks to breakthroughs in mRNA vaccines during the COVID-19 pandemic. Because most previous work aimed to minimize the minimum free energy (MFE) of the mRNA in order to improve stability and protein expression, which only considers one particular structure per mRNA sequence, millions of alternative conformations in equilibrium are neglected. More importantly, we prefer an mRNA to populate multiple stable structures and be flexible among them during translation when the ribosome unwinds it.
Therefore, we consider a new objective to minimize the ensemble free energy of an mRNA, which includes all possible structures in its Boltzmann ensemble. However, this new problem is much harder to solve than the original MFE optimization. To address the increased complexity of this problem, we introduce EnsembleDesign, a novel algorithm that employs continuous relaxation to optimize the expected ensemble free energy over a distribution of candidate sequences. EnsembleDesign extends both the lattice representation of the design space and the dynamic programming algorithm from LinearDesign to their probabilistic counterparts. Our algorithm consistently outperforms LinearDesign in terms of ensemble free energy, especially on long sequences. Interestingly, as byproducts, our designs also enjoy lower average unpaired probabilities (which correlates with degradation) and flatter Boltzmann ensembles (more flexibility between conformations).
Our code is available on: https://github.com/LinearFold/EnsembleDesign.
近年来,由于新冠疫情期间信使核糖核酸(mRNA)疫苗取得突破,设计优化的mRNA序列这一任务备受关注。由于此前大多数工作旨在使mRNA的最小自由能(MFE)最小化,以提高稳定性和蛋白质表达,而这仅考虑了每个mRNA序列的一种特定结构,因此忽略了处于平衡状态的数百万种替代构象。更重要的是,我们希望mRNA在核糖体解旋时能够形成多种稳定结构,并在翻译过程中在这些结构之间灵活转换。
因此,我们考虑了一个新目标,即最小化mRNA的系综自由能,该自由能包括其玻尔兹曼系综中的所有可能结构。然而,这个新问题比原始的MFE优化问题更难解决。为了解决这个问题增加的复杂性,我们引入了EnsembleDesign,这是一种新颖的算法,它采用连续松弛来优化候选序列分布上的预期系综自由能。EnsembleDesign将设计空间的晶格表示和动态规划算法从LinearDesign扩展到了它们的概率对应物。我们的算法在系综自由能方面始终优于LinearDesign,尤其是在长序列上。有趣的是,作为副产品,我们设计的mRNA还具有更低的平均未配对概率(与降解相关)和更平坦的玻尔兹曼系综(构象之间更具灵活性)。
我们的代码可在以下网址获取:https://github.com/LinearFold/EnsembleDesign 。