School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, 730000, Gansu, China.
School of Information Science & Engineering, Lanzhou University, South Tianshui Road, Lanzhou, 730000, Gansu, China.
Comput Biol Med. 2024 Nov;182:109207. doi: 10.1016/j.compbiomed.2024.109207. Epub 2024 Sep 27.
Precise estimations of RNA secondary structures have the potential to reveal the various roles that non-coding RNAs play in regulating cellular activity. However, the mainstay of traditional RNA secondary structure prediction methods relies on thermos-dynamic models via free energy minimization, a laborious process that requires a lot of prior knowledge. Here, RNA secondary structure prediction using Wfold, an end-to-end deep learning-based approach, is suggested. Wfold is trained directly on annotated data and base-pairing criteria. It makes use of an image-like representation of RNA sequences, which an enhanced U-net incorporated with a transformer encoder can process effectively. Wfold eventually increases the accuracy of RNA secondary structure prediction by combining the benefits of self-attention mechanism's mining of long-range information with U-net's ability to gather local information. We compare Wfold's performance using RNA datasets that are within and across families. When trained and evaluated on different RNA families, it achieves a similar performance as the traditional methods, but dramatically outperforms the state-of-the-art methods on within-family datasets. Moreover, Wfold can also reliably forecast pseudoknots. The findings imply that Wfold may be useful for improving sequence alignment, functional annotations, and RNA structure modeling.
精确估计 RNA 二级结构有可能揭示非编码 RNA 在调节细胞活动中所起的各种作用。然而,传统 RNA 二级结构预测方法的基础是通过自由能最小化的热力学模型,这是一个需要大量先验知识的繁琐过程。在这里,我们建议使用基于端到端深度学习的 Wfold 进行 RNA 二级结构预测。Wfold 直接在注释数据和碱基配对标准上进行训练。它利用 RNA 序列的图像式表示,这可以由增强的 U-net 与转换器编码器有效地处理。Wfold 通过结合自注意力机制挖掘远程信息的优势和 U-net 收集本地信息的能力,最终提高了 RNA 二级结构预测的准确性。我们使用在家族内和跨家族的 RNA 数据集来比较 Wfold 的性能。当在不同的 RNA 家族上进行训练和评估时,它的性能与传统方法相似,但在家族内数据集上的表现明显优于最先进的方法。此外,Wfold 还可以可靠地预测假结。这些发现表明,Wfold 可能有助于改进序列比对、功能注释和 RNA 结构建模。