Suppr超能文献

RNADiffFold:使用离散扩散模型进行生成式 RNA 二级结构预测。

RNADiffFold: generative RNA secondary structure prediction using discrete diffusion models.

机构信息

Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China.

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310014, Zhejiang, China.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae618.

Abstract

Ribonucleic acid (RNA) molecules are essential macromolecules that perform diverse biological functions in living beings. Precise prediction of RNA secondary structures is instrumental in deciphering their complex three-dimensional architecture and functionality. Traditional methodologies for RNA structure prediction, including energy-based and learning-based approaches, often depict RNA secondary structures from a static perspective and rely on stringent a priori constraints. Inspired by the success of diffusion models, in this work, we introduce RNADiffFold, an innovative generative prediction approach of RNA secondary structures based on multinomial diffusion. We reconceptualize the prediction of contact maps as akin to pixel-wise segmentation and accordingly train a denoising model to refine the contact maps starting from a noise-infused state progressively. We also devise a potent conditioning mechanism that harnesses features extracted from RNA sequences to steer the model toward generating an accurate secondary structure. These features encompass one-hot encoded sequences, probabilistic maps generated from a pre-trained scoring network, and embeddings and attention maps derived from RNA foundation model. Experimental results on both within- and cross-family datasets demonstrate RNADiffFold's competitive performance compared with current state-of-the-art methods. Additionally, RNADiffFold has shown a notable proficiency in capturing the dynamic aspects of RNA structures, a claim corroborated by its performance on datasets comprising multiple conformations.

摘要

核糖核酸(RNA)分子是生物体内执行多种生物学功能的重要大分子。精确预测 RNA 二级结构有助于揭示其复杂的三维结构和功能。传统的 RNA 结构预测方法,包括基于能量和基于学习的方法,通常从静态角度描述 RNA 二级结构,并依赖严格的先验约束。受扩散模型成功的启发,在这项工作中,我们引入了 RNADiffFold,这是一种基于多项式扩散的 RNA 二级结构生成式预测方法。我们将接触图的预测重新概念化为类似于像素级分割,并相应地训练去噪模型,从噪声注入状态开始逐步细化接触图。我们还设计了一种强大的条件机制,利用从 RNA 序列中提取的特征来引导模型生成准确的二级结构。这些特征包括独热编码序列、来自预训练评分网络的概率图以及源自 RNA 基础模型的嵌入和注意力图。在内部和跨家族数据集上的实验结果表明,与当前最先进的方法相比,RNADiffFold 具有竞争力。此外,RNADiffFold 在捕捉 RNA 结构的动态方面表现出色,这一点可以通过它在包含多个构象的数据集上的表现得到证实。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/136e/11586127/037ab94ff0ee/bbae618f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验