设计RNA二级结构很难。

Designing RNA Secondary Structures Is Hard.

作者信息

Bonnet Édouard, Rzążewski Paweł, Sikora Florian

机构信息

Univ Lyon, CNRS, ENS de Lyon, Université Claude Bernard Lyon 1, LIP UMR5668, Lyon, France.

Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.

出版信息

J Comput Biol. 2020 Mar;27(3):302-316. doi: 10.1089/cmb.2019.0420. Epub 2020 Feb 5.

DOI:10.1089/cmb.2019.0420

PMID:32160034

Abstract

A ribonucleic acid (RNA) sequence is a word over an alphabet on four elements [Formula: see text] called bases. RNA sequences fold into secondary structures where some bases pair with one another, while others remain unpaired. The two fundamental problems in RNA algorithmic are to predict how sequences fold within some models of energy and to design sequences of bases that will fold into targeted secondary structures. Predicting how a given RNA sequence folds into a pseudoknot-free secondary structure is known to be solvable in cubic time since the eighties and in truly subcubic time by a recent result of Bringmann et al. (FOCS, 2016), whereas Lyngsø has shown it is computationally hard if pseudoknots are allowed (ICALP, 2004). As a stark contrast, it is unknown whether or not designing a given RNA secondary structure is a tractable task; this has been raised as a challenging open question by Condon (ICALP, 2003). Because of its crucial importance in a number of fields such as pharmaceutical research and biochemistry, there are dozens of heuristics and software libraries dedicated to the RNA secondary structure design. It is therefore rather surprising that the computational complexity of this central problem in bioinformatics has been unsettled for decades. In this article, we show that in the simplest model of energy, which is the Watson-Crick model, the design of secondary structures is computationally hard if one adds natural constraints of the form: indexiof the sequence has to be labeled by baseb. This negative result suggests that the same lower bound holds for more realistic models of energy. It is noteworthy that the additional constraints are by no means artificial: they are provided by all the RNA design pieces of software and they do correspond to the actual practice (e.g., the instances of the EteRNA project).

摘要

核糖核酸（RNA）序列是由四个元素[公式：见文本]（称为碱基）组成的字母表上的一个词。RNA序列折叠成二级结构，其中一些碱基相互配对，而其他碱基保持未配对状态。RNA算法中的两个基本问题是预测序列在某些能量模型中如何折叠，以及设计能够折叠成目标二级结构的碱基序列。自八十年代以来，已知预测给定的RNA序列如何折叠成无假结的二级结构在立方时间内可解，并且根据Bringmann等人最近的结果（FOCS，2016）在真正的亚立方时间内可解，而Lyngsø表明如果允许假结则计算上是困难的（ICALP，2004）。形成鲜明对比的是，设计给定的RNA二级结构是否是一个可处理的任务尚不清楚；这已被Condon提出作为一个具有挑战性的开放问题（ICALP，2003）。由于其在药物研究和生物化学等许多领域的至关重要性，有数十种启发式方法和软件库致力于RNA二级结构设计。因此，相当令人惊讶的是，这个生物信息学核心问题的计算复杂性几十年来一直未得到解决。在本文中，我们表明，在最简单的能量模型即沃森 - 克里克模型中，如果添加形式为：序列的索引i必须由碱基b标记的自然约束，则二级结构的设计在计算上是困难的。这个负面结果表明，对于更现实的能量模型也有相同的下限。值得注意的是，这些额外的约束绝不是人为的：它们由所有RNA设计软件提供，并且它们确实与实际实践相对应（例如，EteRNA项目的实例）。