Bernard Clément, Postic Guillaume, Ghannay Sahar, Tahi Fariza
Université Paris Saclay, Univ Evry, IBISC, Evry-Courcouronnes 91020, France.
LISN-CNRS/Université Paris-Saclay, Orsay 91400, France.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf004.
Predicting the 3D structure of RNA is an ongoing challenge that has yet to be completely addressed despite continuous advancements. RNA 3D structures rely on distances between residues and base interactions but also backbone torsional angles. Knowing the torsional angles for each residue could help reconstruct its global folding, which is what we tackle in this work. This paper presents a novel approach for directly predicting RNA torsional angles from raw sequence data. Our method draws inspiration from the successful application of language models in various domains and adapts them to RNA.
We have developed a language-based model, RNA-TorsionBERT, incorporating better sequential interactions for predicting RNA torsional and pseudo-torsional angles from the sequence only. Through extensive benchmarking, we demonstrate that our method improves the prediction of torsional angles compared to state-of-the-art methods. In addition, by using our predictive model, we have inferred a torsion angle-dependent scoring function, called TB-MCQ, that replaces the true reference angles by our model prediction. We show that it accurately evaluates the quality of near-native predicted structures, in terms of RNA backbone torsion angle values. Our work demonstrates promising results, suggesting the potential utility of language models in advancing RNA 3D structure prediction.
Source code is freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT.
预测RNA的三维结构是一项持续存在的挑战,尽管不断取得进展,但仍未得到完全解决。RNA的三维结构依赖于残基之间的距离、碱基相互作用以及主链扭转角。了解每个残基的扭转角有助于重建其全局折叠,这正是我们在这项工作中要解决的问题。本文提出了一种从原始序列数据直接预测RNA扭转角的新方法。我们的方法借鉴了语言模型在各个领域的成功应用,并将其应用于RNA。
我们开发了一种基于语言的模型RNA-TorsionBERT,它结合了更好的序列相互作用,仅从序列中预测RNA扭转角和伪扭转角。通过广泛的基准测试,我们证明与现有方法相比,我们的方法改进了扭转角的预测。此外,通过使用我们的预测模型,我们推断出一种依赖于扭转角的评分函数,称为TB-MCQ,它用我们模型的预测取代了真实的参考角。我们表明,就RNA主链扭转角值而言,它能准确评估近天然预测结构的质量。我们的工作展示了有前景的结果,表明语言模型在推进RNA三维结构预测方面具有潜在效用。
源代码可在EvryRNA平台上免费获取:https://evryrna.ibisc.univ-evry.fr/evryrna/RNA-TorsionBERT 。