Burghardt Bernd, Hartmann Alexander K
Institut für Theoretische Physik, Universität Göttingen, Friedrich-Hund-Platz 1, D-37077 Göttingen, Germany.
Phys Rev E Stat Nonlin Soft Matter Phys. 2007 Feb;75(2 Pt 1):021920. doi: 10.1103/PhysRevE.75.021920. Epub 2007 Feb 28.
We consider the inverse-folding problem for RNA secondary structures: for a given (pseudo-knot-free) secondary structure we want to find a sequence that has a certain structure as its ground state. If such a sequence exists, the structure is called designable. We have implemented a branch-and-bound algorithm that is able to do an exhaustive search within the sequence space, i.e., gives an exact answer as to whether such a sequence exists. The bounds required by the branch-and-bound algorithm are calculated by a dynamic programming algorithm. We consider different alphabet sizes and an ensemble of random structures, which we want to design. We find that for two letters almost none of these structures are designable. The designability improves for the three-letter case, but still a significant fraction of structures is undesignable. This changes when we look at the natural four-letter case with two pairs of complementary bases: undesignable structures are the exception, although they still exist. Finally, we also study the relation between designability and the algorithmic complexity of the branch-and-bound algorithm. Within the ensemble of structures, a high average degree of undesignability is correlated with a long time to prove that a given structure is (un-)designable. In the four-letter case, where the designability is high everywhere, the algorithmic complexity is highest in the region of naturally occurring RNA.
我们考虑RNA二级结构的反向折叠问题:对于给定的(无假结的)二级结构,我们想要找到一个以该特定结构作为其基态的序列。如果这样的序列存在,那么该结构就被称为可设计的。我们实现了一种分支定界算法,该算法能够在序列空间内进行穷举搜索,即能确切回答这样的序列是否存在。分支定界算法所需的边界由动态规划算法计算得出。我们考虑不同的字母表大小以及一组我们想要设计的随机结构。我们发现对于两个字母的情况,这些结构几乎都不可设计。在三个字母的情况下,可设计性有所提高,但仍有相当一部分结构不可设计。当我们考虑具有两对互补碱基的天然四字母情况时,情况发生了变化:不可设计的结构只是少数,尽管它们仍然存在。最后,我们还研究了可设计性与分支定界算法的算法复杂度之间的关系。在结构集合中,平均不可设计程度较高与证明给定结构是否可设计所需的时间较长相关。在四字母情况下,可设计性在各处都很高,算法复杂度在天然RNA区域最高。