Suppr超能文献

用于三级RNA结构预测的深度学习方法的系统基准测试。

Systematic benchmarking of deep-learning methods for tertiary RNA structure prediction.

作者信息

Bahai Akash, Kwoh Chee Keong, Mu Yuguang, Li Yinghui

机构信息

School of Biological Sciences (SBS), Nanyang Technological University, Singapore, Singapore.

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.

出版信息

PLoS Comput Biol. 2024 Dec 30;20(12):e1012715. doi: 10.1371/journal.pcbi.1012715. eCollection 2024 Dec.

Abstract

The 3D structure of RNA critically influences its functionality, and understanding this structure is vital for deciphering RNA biology. Experimental methods for determining RNA structures are labour-intensive, expensive, and time-consuming. Computational approaches have emerged as valuable tools, leveraging physics-based-principles and machine learning to predict RNA structures rapidly. Despite advancements, the accuracy of computational methods remains modest, especially when compared to protein structure prediction. Deep learning methods, while successful in protein structure prediction, have shown some promise for RNA structure prediction as well, but face unique challenges. This study systematically benchmarks state-of-the-art deep learning methods for RNA structure prediction across diverse datasets. Our aim is to identify factors influencing performance variation, such as RNA family diversity, sequence length, RNA type, multiple sequence alignment (MSA) quality, and deep learning model architecture. We show that generally ML-based methods perform much better than non-ML methods on most RNA targets, although the performance difference isn't substantial when working with unseen novel or synthetic RNAs. The quality of the MSA and secondary structure prediction both play an important role and most methods aren't able to predict non-Watson-Crick pairs in the RNAs. Overall among the automated 3D RNA structure prediction methods, DeepFoldRNA has the best prediction results followed by DRFold as the second best method. Finally, we also suggest possible mitigations to improve the quality of the prediction for future method development.

摘要

RNA的三维结构对其功能有着至关重要的影响,理解这种结构对于解读RNA生物学至关重要。确定RNA结构的实验方法既耗费人力、成本高昂又耗时。计算方法已成为有价值的工具,利用基于物理的原理和机器学习来快速预测RNA结构。尽管取得了进展,但计算方法的准确性仍然有限,尤其是与蛋白质结构预测相比。深度学习方法虽然在蛋白质结构预测中取得了成功,但在RNA结构预测中也显示出了一些前景,但面临着独特的挑战。本研究系统地对跨不同数据集的RNA结构预测的最先进深度学习方法进行了基准测试。我们的目标是确定影响性能变化的因素,如RNA家族多样性、序列长度、RNA类型、多序列比对(MSA)质量和深度学习模型架构。我们表明,一般来说,基于机器学习的方法在大多数RNA靶点上的表现比非机器学习方法要好得多,尽管在处理未见的新型或合成RNA时性能差异并不显著。MSA的质量和二级结构预测都起着重要作用,并且大多数方法无法预测RNA中的非沃森-克里克碱基对。总体而言,在自动3D RNA结构预测方法中,DeepFoldRNA具有最佳预测结果,其次是DRFold,为第二好的方法。最后,我们还提出了可能的缓解措施,以提高未来方法开发的预测质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56ea/11723642/2fcbe0fb7c2e/pcbi.1012715.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验