Liu Kaifeng, Yu Xiangyu, Cui Huizi, Li Wannan, Han Weiwei
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, Edmond H. Fischer Signal Transduction Laboratory, School of Life Sciences, Jilin University, Qianjin road 2699, Changchun 130012, China.
Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, Edmond H. Fischer Signal Transduction Laboratory, School of Life Sciences, Jilin University, Qianjin road 2699, Changchun 130012, China.
Int J Biol Macromol. 2024 Dec;282(Pt 5):137069. doi: 10.1016/j.ijbiomac.2024.137069. Epub 2024 Oct 31.
The accurate prediction of inhibitor-kinase binding affinity is crucial in biological research and medical applications. Particularly, kinases play a pivotal role in numerous cellular processes and are essential enzymes in Mitogen-Activated Protein Kinase (MAPK) signaling pathway. This present study harnesses the capabilities of Large Language Models (LLMs), specifically GPT-4, to predict the binding affinity between inhibitors and kinases within the MAPK pathway, including Raf protein kinase (RAF), Mitogen-activated protein kinase kinase (MEK) and Extracellular Signal-Regulated Kinase (ERK). Remarkably, GPT-4 achieved an impressive 87.31 % accuracy in prediction on RAF binding affinity, and 77.00 % accuracy in comprehensive prediction tasks, substantially outperforming existing mainstream methods such as Autodock Vina (21.21 %), BatchDTA (52.00 %) and KIPP (59.60 %). Furthermore, GPT-4 was employed to delineate the features of high-affinity and low-affinity molecules, as well as their contributing functional groups. These contributing groups were subsequently validated through molecular docking. Additionally, to validate the generalizability of the method, we applied it to six other kinases and achieved a maximum accuracy of 83.78 %. Also, we utilized a dataset comprising over 200 kinases, obtaining a high accuracy of 66.20 %. The study showcases the transformative impact of LLMs on molecular binding affinity prediction, with major implications for biological sciences and therapeutic development.
准确预测抑制剂与激酶的结合亲和力在生物学研究和医学应用中至关重要。特别是,激酶在众多细胞过程中起着关键作用,是丝裂原活化蛋白激酶(MAPK)信号通路中的关键酶。本研究利用大语言模型(LLMs),特别是GPT-4的能力,来预测MAPK通路中抑制剂与激酶之间的结合亲和力,包括 Raf 蛋白激酶(RAF)、丝裂原活化蛋白激酶激酶(MEK)和细胞外信号调节激酶(ERK)。值得注意的是,GPT-4在预测RAF结合亲和力方面达到了令人印象深刻的87.31%的准确率,在综合预测任务中准确率为77.00%,大大超过了现有的主流方法,如Autodock Vina(21.21%)、BatchDTA(52.00%)和KIPP(59.60%)。此外,GPT-4被用于描绘高亲和力和低亲和力分子的特征及其贡献的官能团。这些贡献基团随后通过分子对接进行了验证。此外,为了验证该方法的通用性,我们将其应用于其他六种激酶,最高准确率达到了83.78%。我们还使用了一个包含200多种激酶的数据集,获得了66.20%的高准确率。该研究展示了大语言模型对分子结合亲和力预测的变革性影响,对生物科学和治疗发展具有重要意义。