Fukunaga Tsukasa, Hamada Michiaki
Waseda Institute for Advanced Study, Waseda University, Tokyo 1690051, Japan.
Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 1698555, Japan.
Bioinform Adv. 2022 Oct 22;2(1):vbac078. doi: 10.1093/bioadv/vbac078. eCollection 2022.
RNA consensus secondary structure prediction from aligned sequences is a powerful approach for improving the secondary structure prediction accuracy. However, because the computational complexities of conventional prediction tools scale with the cube of the alignment lengths, their application to long RNA sequences, such as viral RNAs or long non-coding RNAs, requires significant computational time.
In this study, we developed LinAliFold and CentroidLinAliFold, fast RNA consensus secondary structure prediction tools based on minimum free energy and maximum expected accuracy principles, respectively. We achieved software acceleration using beam search methods that were successfully used for fast secondary structure prediction from a single RNA sequence. Benchmark analyses showed that LinAliFold and CentroidLinAliFold were much faster than the existing methods while preserving the prediction accuracy. As an empirical application, we predicted the consensus secondary structure of coronaviruses with approximately 30 000 nt in 5 and 79 min by LinAliFold and CentroidLinAliFold, respectively. We confirmed that the predicted consensus secondary structure of coronaviruses was consistent with the experimental results.
The source codes of LinAliFold and CentroidLinAliFold are freely available at https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold.
Supplementary data are available at online.
从比对序列预测RNA共有二级结构是提高二级结构预测准确性的一种有效方法。然而,由于传统预测工具的计算复杂度与比对长度的立方成正比,因此将其应用于长RNA序列(如病毒RNA或长链非编码RNA)时需要大量的计算时间。
在本研究中,我们分别基于最小自由能和最大期望准确度原则开发了快速RNA共有二级结构预测工具LinAliFold和CentroidLinAliFold。我们使用成功应用于从单个RNA序列进行快速二级结构预测的束搜索方法实现了软件加速。基准分析表明,LinAliFold和CentroidLinAliFold在保持预测准确性的同时,比现有方法快得多。作为一个实证应用,我们分别使用LinAliFold和CentroidLinAliFold在5分钟和79分钟内预测了约30000个核苷酸的冠状病毒共有二级结构。我们证实预测的冠状病毒共有二级结构与实验结果一致。
LinAliFold和CentroidLinAliFold的源代码可在https://github.com/fukunagatsu/LinAliFold-CentroidLinAliFold上免费获取。
补充数据可在网上获取。