Shen Tao, Hu Zhihang, Sun Siqi, Liu Di, Wong Felix, Wang Jiuming, Chen Jiayang, Wang Yixuan, Hong Liang, Xiao Jin, Zheng Liangzhen, Krishnamoorthi Tejas, King Irwin, Wang Sheng, Yin Peng, Collins James J, Li Yu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
Shanghai Zelixir Biotech Company Ltd, Shanghai, China.
Nat Methods. 2024 Dec;21(12):2287-2298. doi: 10.1038/s41592-024-02487-0. Epub 2024 Nov 21.
Accurate prediction of RNA three-dimensional (3D) structures remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to the scarcity of experimentally determined data, complicates computational prediction efforts. Here we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pretrained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate the superiority of RhoFold+ over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and interhelical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.
准确预测RNA三维(3D)结构仍然是一个未解决的挑战。确定RNA 3D结构对于理解其功能以及为RNA靶向药物开发和合成生物学设计提供信息至关重要。RNA的结构灵活性导致实验确定的数据稀缺,这使得计算预测工作变得复杂。在这里,我们展示了RhoFold+,一种基于RNA语言模型的深度学习方法,它可以从序列中准确预测单链RNA的3D结构。通过整合在约2370万个RNA序列上预训练的RNA语言模型,并利用技术来解决数据稀缺问题,RhoFold+为RNA 3D结构预测提供了一个全自动的端到端流程。对RNA-Puzzles和CASP15天然RNA靶标的回顾性评估证明了RhoFold+相对于现有方法(包括人类专家组)的优越性。通过跨家族和跨类型评估以及时间审查基准,进一步验证了其有效性和通用性。此外,RhoFold+预测RNA二级结构和螺旋间角度,提供了经验上可验证的特征,拓宽了其在RNA结构和功能研究中的适用性。