Systems Engineering Institute, School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi 710049, China.
Department of Computer Science, University of California, Irvine, CA 92697, USA.
Nucleic Acids Res. 2022 Feb 22;50(3):e14. doi: 10.1093/nar/gkab1074.
For many RNA molecules, the secondary structure is essential for the correct function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization, which imposes strong prior assumptions and is slow to run. Here, we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data and base-pairing rules. UFold proposes a novel image-like representation of RNA sequences, which can be efficiently processed by Fully Convolutional Networks (FCNs). We benchmark the performance of UFold on both within- and cross-family RNA datasets. It significantly outperforms previous methods on within-family datasets, while achieving a similar performance as the traditional methods when trained and tested on distinct RNA families. UFold is also able to predict pseudoknots accurately. Its prediction is fast with an inference time of about 160 ms per sequence up to 1500 bp in length. An online web server running UFold is available at https://ufold.ics.uci.edu. Code is available at https://github.com/uci-cbcl/UFold.
对于许多 RNA 分子来说,二级结构对于 RNA 的正确功能至关重要。从核苷酸序列预测 RNA 二级结构是基因组学中长期存在的问题,但随着时间的推移,预测性能已经达到了一个瓶颈。传统的 RNA 二级结构预测算法主要基于通过自由能最小化的热力学模型,这强加了很强的先验假设,并且运行速度较慢。在这里,我们提出了一种基于深度学习的方法,称为 UFold,用于 RNA 二级结构预测,它直接在注释数据和碱基配对规则上进行训练。UFold 提出了一种新的 RNA 序列图像式表示,可以通过全卷积网络(FCN)高效处理。我们在家族内和跨家族 RNA 数据集上对 UFold 的性能进行了基准测试。它在家族内数据集上的表现明显优于以前的方法,而在训练和测试不同的 RNA 家族时,其性能与传统方法相当。UFold 还能够准确地预测假结。它的预测速度很快,对于长度可达 1500bp 的序列,推断时间约为 160ms。一个运行 UFold 的在线网络服务器可在 https://ufold.ics.uci.edu 上访问。代码可在 https://github.com/uci-cbcl/UFold 上获得。