School of EECS, Oregon State University, Corvallis, OR 97330, USA.
Dept. of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642, USA; Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, USA; Dept. of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, USA.
J Mol Biol. 2024 Sep 1;436(17):168694. doi: 10.1016/j.jmb.2024.168694. Epub 2024 Jul 4.
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (∼30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 h on the above 400 genomes, or ∼36× speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
预测一组对齐的 RNA 同源物的共识结构是一种方便的方法,可以在 RNA 基因组中找到保守结构,这在病毒诊断和治疗等方面有许多应用。然而,用于此任务的最常用工具 RNAalifold 对于长序列来说速度非常慢,因为它的规模与序列长度呈立方关系,对于 400 个 SARS-CoV-2 和 SARS 相关基因组(约 30,000nt)来说,需要一天以上的时间。我们提出了 LinearAlifold,这是一种更快的替代方法,它基于我们的工作 LinearFold,该方法可以在线性时间内折叠单个 RNA,因此它的规模与序列长度和序列数量都呈线性关系。我们的工作比 RNAalifold 快几个数量级(在上述 400 个基因组上只需 0.7 小时,或 36 倍的加速),并且与已知结构数据库相比,具有更高的准确性。更有趣的是,LinearAlifold 对 SARS-CoV-2 的预测与实验确定的结构很好地相关,大大优于 RNAalifold。最后,LinearAlifold 支持两种能量模型(Vienna 和 BL*)和四种模式:最小自由能(MFE)、最大预期准确性(MEA)、ThreshKnot 和随机采样,对于数百种 SARS-CoV 变体,每种模式都在一小时内完成。我们的资源位于:https://github.com/LinearFold/LinearAlifold(代码)和 http://linearfold.org/linear-alifold(服务器)。