Sun Saisai, Yang Jianyi, Gao Lin, Li Pengyong, Liu Yumeng
School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi 710126, China.
MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, Shandong 266237, China.
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf447.
The structural complexities enable RNA to serve as a versatile molecular scaffold capable of binding small molecules with high specificity. Understanding these interactions is essential for elucidating RNA's role in disease mechanisms and developing RNA-targeted therapeutics. However, predicting RNA-small molecule binding sites remains a significant challenge due to their conformational flexibility, structural diversity, and the limited availability of high-resolution structural data.
In this study, we propose RLsite, a novel computational framework integrating pre-trained RNA language models with graph attention networks (GAT) to predict small-molecule binding sites on RNA. Our method effectively captures both sequential and structural features of RNA by leveraging large-scale RNA sequence data to learn intrinsic patterns and processing graph-based RNA structures to highlight key topological and spatial features. Compared to existing methods, RLsite demonstrates superior accuracy, generalizability, and biological relevance, achieving a Precision of 0.749, a Recall of 0.654, an MCC of 0.474, and an AUC of 0.828 on the public test set, which significantly outperforms the previous models, such as CapBind (an AUC of 0.770), MultiModRLBP (an AUC of 0.780), and RNABind (an AUC of 0.471). Notably, a case study of the PreQ1 riboswitch has achieved strong predictive performance (AUC = 0.97, Recall = 0.9), and its predicted binding sites have been confirmed experimentally. These results underscore our method as a potentially powerful tool for RNA-targeted drug discovery and advancing our understanding of RNA-ligand interactions.
The resource codes and data can be accessed at https://github.com/SaisaiSun/RLsite.
RNA的结构复杂性使其能够作为一种多功能分子支架,以高特异性结合小分子。理解这些相互作用对于阐明RNA在疾病机制中的作用以及开发以RNA为靶点的治疗方法至关重要。然而,由于RNA的构象灵活性、结构多样性以及高分辨率结构数据的有限可用性,预测RNA-小分子结合位点仍然是一项重大挑战。
在本研究中,我们提出了RLsite,这是一种将预训练的RNA语言模型与图注意力网络(GAT)相结合的新型计算框架,用于预测RNA上的小分子结合位点。我们的方法通过利用大规模RNA序列数据学习内在模式,并处理基于图的RNA结构以突出关键的拓扑和空间特征,有效地捕捉了RNA的序列和结构特征。与现有方法相比,RLsite在准确性、通用性和生物学相关性方面表现出色,在公共测试集上的精确率为0.749,召回率为0.654,马修斯相关系数为0.474,曲线下面积为0.828,显著优于之前的模型,如CapBind(曲线下面积为0.770)、MultiModRLBP(曲线下面积为0.780)和RNABind(曲线下面积为0.471)。值得注意的是,对PreQ1核糖开关的案例研究取得了强大的预测性能(曲线下面积=0.97,召回率=0.9),其预测的结合位点已通过实验得到证实。这些结果强调了我们的方法作为一种潜在的强大工具,可用于以RNA为靶点的药物发现,并加深我们对RNA-配体相互作用的理解。