Aguirre-Hernández Rosalía, Hoos Holger H, Condon Anne
Institute of Applied Mathematics, University of British Columbia, Vancouver, BC, Canada.
BMC Bioinformatics. 2007 Jan 31;8:34. doi: 10.1186/1471-2105-8-34.
We investigate the empirical complexity of the RNA secondary structure design problem, that is, the scaling of the typical difficulty of the design task for various classes of RNA structures as the size of the target structure is increased. The purpose of this work is to understand better the factors that make RNA structures hard to design for existing, high-performance algorithms. Such understanding provides the basis for improving the performance of one of the best algorithms for this problem, RNA-SSD, and for characterising its limitations.
To gain insights into the practical complexity of the problem, we present a scaling analysis on random and biologically motivated structures using an improved version of the RNA-SSD algorithm, and also the RNAinverse algorithm from the Vienna package. Since primary structure constraints are relevant for designing RNA structures, we also investigate the correlation between the number and the location of the primary structure constraints when designing structures and the performance of the RNA-SSD algorithm. The scaling analysis on random and biologically motivated structures supports the hypothesis that the running time of both algorithms scales polynomially with the size of the structure. We also found that the algorithms are in general faster when constraints are placed only on paired bases in the structure. Furthermore, we prove that, according to the standard thermodynamic model, for some structures that the RNA-SSD algorithm was unable to design, there exists no sequence whose minimum free energy structure is the target structure.
Our analysis helps to better understand the strengths and limitations of both the RNA-SSD and RNAinverse algorithms, and suggests ways in which the performance of these algorithms can be further improved.
我们研究RNA二级结构设计问题的实际复杂度,即随着目标结构大小的增加,各类RNA结构设计任务的典型难度的缩放情况。这项工作的目的是更好地理解导致现有高性能算法难以设计RNA结构的因素。这种理解为改进解决该问题的最佳算法之一RNA - SSD的性能及其局限性的刻画提供了基础。
为深入了解该问题的实际复杂度,我们使用RNA - SSD算法的改进版本以及维也纳软件包中的RNAinverse算法,对随机结构和具有生物学动机的结构进行了缩放分析。由于一级结构约束与RNA结构设计相关,我们还研究了设计结构时一级结构约束的数量和位置与RNA - SSD算法性能之间的相关性。对随机结构和具有生物学动机的结构的缩放分析支持了这一假设,即两种算法的运行时间均与结构大小呈多项式缩放关系。我们还发现,通常在仅对结构中的配对碱基施加约束时,算法速度更快。此外,我们证明,根据标准热力学模型,对于RNA - SSD算法无法设计的某些结构,不存在其最小自由能结构为目标结构的序列。
我们的分析有助于更好地理解RNA - SSD和RNAinverse算法的优势与局限性,并提出了进一步提高这些算法性能的方法。