基于采样的RNA设计中耦合变量的连续优化

Sampling-based Continuous Optimization with Coupled Variables for RNA Design.

作者信息

Tang Wei Yu, Dai Ning, Zhou Tianshuo, Mathews David H, Huang Liang

机构信息

School of EECS.

Dept. of Quantitative and Computational Biology, University of Southern California.

出版信息

ArXiv. 2024 Dec 11:arXiv:2412.08751v1.

PMID:39713799

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11661286/

Abstract

The task of RNA design given a target structure aims to find a sequence that can fold into that structure. It is a computationally hard problem where some version(s) have been proven to be NP-hard. As a result, heuristic methods such as local search have been popular for this task, but by only exploring a fixed number of candidates. They can not keep up with the exponential growth of the design space, and often perform poorly on longer and harder-to-design structures. We instead formulate these discrete problems as continuous optimization, which starts with a distribution over all possible candidate sequences, and uses gradient descent to improve the expectation of an objective function. We define novel distributions based on coupled variables to rule out invalid sequences given the target structure and to model the correlation between nucleotides. To make it universally applicable to any objective function, we use sampling to approximate the expected objective function, to estimate the gradient, and to select the final candidate. Compared to state-of-the-art methods, our work consistently outperforms them in key metrics such as Boltzmann probability, ensemble defect, and energy gap, especially on long and hard-to-design puzzles in the Eterna100 benchmark. Our code is available at: http://github.com/weiyutang1010/ncrna_design.

摘要

给定目标结构的RNA设计任务旨在找到一个能够折叠成该结构的序列。这是一个计算难题，其中某些版本已被证明是NP难问题。因此，诸如局部搜索之类的启发式方法在该任务中很受欢迎，但它们只探索固定数量的候选序列。它们无法跟上设计空间的指数增长，并且在处理更长且更难设计的结构时通常表现不佳。相反，我们将这些离散问题表述为连续优化问题，该问题从所有可能候选序列的分布开始，并使用梯度下降来改进目标函数的期望值。我们基于耦合变量定义新颖的分布，以排除给定目标结构下的无效序列，并对核苷酸之间的相关性进行建模。为了使其普遍适用于任何目标函数，我们使用采样来近似期望的目标函数、估计梯度并选择最终候选序列。与现有方法相比，我们的工作在诸如玻尔兹曼概率、系综缺陷和能隙等关键指标上始终优于它们，特别是在Eterna100基准测试中的长且难设计的谜题上。我们的代码可在以下网址获取：http://github.com/weiyutang1010/ncrna_design 。

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验