Wei Sin-Siang, Jhang Wei-En, Liu Yu-Chen, Chuang Cheng-Che, Ou Yu-Yen
Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan.
Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li 32003, Taiwan.
J Chem Inf Model. 2025 Jul 14;65(13):7277-7284. doi: 10.1021/acs.jcim.5c00766. Epub 2025 Jun 18.
Receptor tyrosine kinases (RTKs) are key regulators of cellular signaling and are frequently involved in cancer development. As their activation depends on ATP binding to the kinase domain, precisely identifying ATP binding sites is critical for mechanistic studies and targeted therapy development. However, general ATP binding site prediction methods often fall short for RTKs due to their diverse structural features across different protein families. To address this challenge, we introduce RTK_RAG, a framework that integrates retrieval-augmented generation (RAG) and utilizes protein language models (PLMs) with a multiwindow convolutional neural network (MCNN) architecture to improve ATP binding site prediction for RTKs. When tested on an independent RTK data set, RTK_RAG outperforms general ATP binding site predictors on multiple evaluation metrics. By accounting for RTK-specific structural differences, our study provides a reliable tool for researching RTK function and facilitating the development of novel kinase inhibitors. Moreover, this approach demonstrates the potential of RAG-based frameworks for enhancing functional predictions in specialized protein families, offering a generalizable strategy for improving binding site identification in specific protein families.
受体酪氨酸激酶(RTK)是细胞信号传导的关键调节因子,经常参与癌症发展。由于其激活依赖于ATP与激酶结构域的结合,精确识别ATP结合位点对于机制研究和靶向治疗开发至关重要。然而,由于不同蛋白质家族的RTK具有多样的结构特征,一般的ATP结合位点预测方法往往对RTK效果不佳。为应对这一挑战,我们引入了RTK_RAG,这是一个整合了检索增强生成(RAG)的框架,并利用具有多窗口卷积神经网络(MCNN)架构的蛋白质语言模型(PLM)来改进RTK的ATP结合位点预测。在独立的RTK数据集上进行测试时,RTK_RAG在多个评估指标上优于一般的ATP结合位点预测器。通过考虑RTK特异性的结构差异,我们的研究为研究RTK功能和促进新型激酶抑制剂的开发提供了一个可靠的工具。此外,这种方法展示了基于RAG的框架在增强特定蛋白质家族功能预测方面的潜力,为改进特定蛋白质家族中结合位点的识别提供了一种可推广的策略。