一项关于可及基序与RNA折叠复杂性的研究。

A study of accessible motifs and RNA folding complexity.

作者信息

Wexler Ydo, Zilberstein Chaya, Ziv-Ukelson Michal

机构信息

Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel.

出版信息

J Comput Biol. 2007 Jul-Aug;14(6):856-72. doi: 10.1089/cmb.2007.R020.

DOI:10.1089/cmb.2007.R020

PMID:17691898

Abstract

mRNA molecules are folded in the cells and therefore many of their substrings may actually be inaccessible to protein and microRNA binding. The need to apply an accessibility criterion to the task of genome-wide mRNA motif discovery raises the challenge of overcoming the core O(n(3)) factor imposed by the time complexity of the currently best known algorithms for RNA secondary structure prediction. We speed up the dynamic programming algorithms that are standard for RNA folding prediction. Our new approach significantly reduces the computations without sacrificing the optimality of the results, yielding an expected time complexity of O(n(2) psi(n)), where psi(n) is shown to be constant on average under standard polymer folding models. A benchmark analysis confirms that in practice the runtime ratio between the previous approach and the new algorithm indeed grows linearly with increasing sequence size. The fast new RNA folding algorithm is utilized for genome-wide discovery of accessible cis-regulatory motifs in data sets of ribosomal densities and decay rates of S. cerevisiae genes and to the mining of exposed binding sites of tissue-specific microRNAs in A. thaliana.

摘要

信使核糖核酸（mRNA）分子在细胞中会折叠，因此它们的许多子串实际上可能无法与蛋白质和微小核糖核酸（microRNA）结合。在全基因组mRNA基序发现任务中应用可及性标准，这就带来了挑战，即要克服目前最知名的RNA二级结构预测算法的时间复杂度所带来的核心O(n³) 因素。我们加速了RNA折叠预测标准的动态规划算法。我们的新方法在不牺牲结果最优性的情况下显著减少了计算量，产生了O(n²ψ(n)) 的预期时间复杂度，其中ψ(n) 在标准聚合物折叠模型下平均显示为常数。基准分析证实，在实际中，先前方法与新算法之间的运行时比率确实随着序列大小的增加而线性增长。这种快速的新RNA折叠算法被用于在酿酒酵母基因的核糖体密度和衰减率数据集中全基因组发现可及的顺式调控基序，以及挖掘拟南芥中组织特异性微小核糖核酸的暴露结合位点。