通过启用几何注意力的生物语言模型配对进行单序列蛋白质-RNA复合物结构预测。

Single-sequence protein-RNA complex structure prediction by geometric attention-enabled pairing of biological language models.

作者信息

Roche Rahmatullah, Tarafder Sumit, Bhattacharya Debswapna

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America.

出版信息

bioRxiv. 2024 Jul 28:2024.07.27.605468. doi: 10.1101/2024.07.27.605468.

DOI:10.1101/2024.07.27.605468

PMID:39091736

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11291176/

Abstract

Ground-breaking progress has been made in structure prediction of biomolecular assemblies, including the recent breakthrough of AlphaFold 3. However, it remains challenging for AlphaFold 3 and other state-of-the-art deep learning-based methods to accurately predict protein-RNA complex structures, in part due to the limited availability of evolutionary and structural information related to protein-RNA interactions that are used as inputs to the existing approaches. Here, we introduce ProRNA3D-single, a new deep-learning framework for protein-RNA complex structure prediction with only single-sequence input. Using a novel geometric attention-enabled pairing of biological language models of protein and RNA, a previously unexplored avenue, ProRNA3D-single enables the prediction of interatomic protein-RNA interaction maps, which are then transformed into multi-scale geometric restraints for modeling 3D structures of protein-RNA complexes via geometry optimization. Benchmark tests show that ProRNA3D-single convincingly outperforms current state-of-the-art methods including AlphaFold 3, particularly when evolutionary information is limited; and exhibits remarkable robustness and performance resilience by attaining better accuracy with only single-sequence input than what most methods can achieve even with explicit evolutionary information. Freely available at https://github.com/Bhattacharya-Lab/ProRNA3D-single, ProRNA3D-single should be broadly useful for modeling 3D structures of protein-RNA complexes at scale, regardless of the availability of evolutionary information.

摘要

在生物分子组装体的结构预测方面已经取得了突破性进展，包括最近AlphaFold 3的突破。然而，对于AlphaFold 3和其他基于深度学习的先进方法来说，准确预测蛋白质-RNA复合物结构仍然具有挑战性，部分原因是作为现有方法输入的与蛋白质-RNA相互作用相关的进化和结构信息有限。在这里，我们介绍了ProRNA3D-single，这是一种仅使用单序列输入进行蛋白质-RNA复合物结构预测的新型深度学习框架。通过使用一种新颖的、启用几何注意力的蛋白质和RNA生物语言模型配对（这是一条以前未探索的途径），ProRNA3D-single能够预测原子间蛋白质-RNA相互作用图谱，然后将其转化为多尺度几何约束，通过几何优化对蛋白质-RNA复合物的三维结构进行建模。基准测试表明，ProRNA3D-single令人信服地优于包括AlphaFold 3在内的当前先进方法，特别是在进化信息有限时；并且通过仅使用单序列输入就能获得比大多数方法即使使用明确进化信息时还要好的准确性，展现出显著的稳健性和性能弹性。ProRNA3D-single可在https://github.com/Bhattacharya-Lab/ProRNA3D-single上免费获取，无论进化信息是否可用，它对于大规模建模蛋白质-RNA复合物的三维结构都应该具有广泛的用途。