Do Chuong B, Foo Chuan-Sheng, Batzoglou Serafim
Computer Science Department, Stanford University, Stanford, CA 94305, USA.
Bioinformatics. 2008 Jul 1;24(13):i68-76. doi: 10.1093/bioinformatics/btn177.
The need for accurate and efficient tools for computational RNA structure analysis has become increasingly apparent over the last several years: RNA folding algorithms underlie numerous applications in bioinformatics, ranging from microarray probe selection to de novo non-coding RNA gene prediction. In this work, we present RAF (RNA Alignment and Folding), an efficient algorithm for simultaneous alignment and consensus folding of unaligned RNA sequences. Algorithmically, RAF exploits sparsity in the set of likely pairing and alignment candidates for each nucleotide (as identified by the CONTRAfold or CONTRAlign programs) to achieve an effectively quadratic running time for simultaneous pairwise alignment and folding. RAF's fast sparse dynamic programming, in turn, serves as the inference engine within a discriminative machine learning algorithm for parameter estimation.
In cross-validated benchmark tests, RAF achieves accuracies equaling or surpassing the current best approaches for RNA multiple sequence secondary structure prediction. However, RAF requires nearly an order of magnitude less time than other simultaneous folding and alignment methods, thus making it especially appropriate for high-throughput studies.
Source code for RAF is available at:http://contra.stanford.edu/contrafold/.
在过去几年中,对用于计算RNA结构分析的准确且高效工具的需求日益明显:RNA折叠算法是生物信息学中众多应用的基础,从微阵列探针选择到从头非编码RNA基因预测。在这项工作中,我们提出了RAF(RNA比对与折叠),一种用于未比对RNA序列的同时比对和一致性折叠的高效算法。在算法上,RAF利用每个核苷酸的可能配对和比对候选集的稀疏性(由CONTRAfold或CONTRAlign程序确定),以实现同时进行成对比对和折叠的有效二次运行时间。反过来,RAF的快速稀疏动态规划在用于参数估计的判别式机器学习算法中充当推理引擎。
在交叉验证的基准测试中,RAF实现的准确率等于或超过了当前用于RNA多序列二级结构预测的最佳方法。然而,RAF所需的时间比其他同时进行折叠和比对的方法少近一个数量级,因此使其特别适合高通量研究。
RAF的源代码可在以下网址获取:http://contra.stanford.edu/contrafold/ 。