Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina.
Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina.
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae271.
Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement.
In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
编码和非编码 RNA 分子参与许多重要的生物学过程。非编码 RNA 折叠成具有明确定义的二级结构以发挥其功能。然而,从原始 RNA 序列预测二级结构是一个长期存在的未解决的问题,由于深度学习,几十年来几乎没有改进的性能现在又重新出现。传统的 RNA 二级结构预测算法主要基于热力学模型和动态规划进行自由能最小化。最近,深度学习方法与经典方法相比表现出了有竞争力的性能,但仍有很大的改进空间。
在这项工作中,我们提出了 sincFold,这是一种端到端的深度学习方法,仅使用 RNA 序列作为输入来预测核苷酸接触矩阵。该模型基于 1D 和 2D 残差神经网络,可以学习短程和长程相互作用模式。我们表明,在最小的物理假设下可以准确地预测结构。在几个基准数据集上进行了广泛的实验,考虑了序列同源性和跨家族验证。sincFold 与经典方法和最近的深度学习模型进行了比较,表明它可以优于最先进的方法。