Sun Liping, Hu Jili, Yang Yinfeng, Wang Yongkang, Wang Zijian, Gao Yong, Nie Yiqi, Liu Can, Kan Hongxing
School of Medical Informatics Engineering, Anhui University of Chinese Medicine, Hefei, Anhui 230012, China.
College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
J Chem Inf Model. 2024 Sep 9;64(17):6736-6744. doi: 10.1021/acs.jcim.4c00600. Epub 2024 Jun 3.
The design of nanozymes with superior catalytic activities is a prerequisite for broadening their biomedical applications. Previous studies have exerted significant effort in theoretical calculation and experimental trials for enhancing the catalytic activity of nanozyme. Machine learning (ML) provides a forward-looking aid in predicting nanozyme catalytic activity. However, this requires a significant amount of human effort for data collection. In addition, the prediction accuracy urgently needs to be improved. Herein, we demonstrate that ChatGPT can collaborate with humans to efficiently collect data. We establish four qualitative models (random forest (RF), decision tree (DT), adaboost random forest (adaboost-RF), and adaboost decision tree (adaboost-DT)) for predicting nanozyme catalytic types, such as peroxidase, oxidase, catalase, superoxide dismutase, and glutathione peroxidase. Furthermore, we use five quantitative models (random forest (RF), decision tree (DT), Support Vector Regression (SVR), gradient boosting regression (GBR), and fully connected deep neuron network (DNN)) to predict nanozyme catalytic activities. We find that GBR model demonstrates superior prediction performance for nanozyme catalytic activities ( = 0.6476 for Km and = 0.95 for Kcat). Moreover, an open-access web resource, AI-ZYMES, with a ChatGPT-based nanozyme copilot is developed for predicting nanozyme catalytic types and activities and guiding the synthesis of nanozyme. The accuracy of the nanozyme copilot's responses reaches more than 90% through the retrieval augmented generation. This study provides a new potential application for ChatGPT in the field of nanozymes.
设计具有卓越催化活性的纳米酶是拓宽其生物医学应用的前提条件。以往的研究在理论计算和实验试验方面付出了巨大努力,以提高纳米酶的催化活性。机器学习(ML)为预测纳米酶的催化活性提供了前瞻性的帮助。然而,这需要大量人力进行数据收集。此外,预测准确性亟待提高。在此,我们证明ChatGPT可以与人类合作高效收集数据。我们建立了四个定性模型(随机森林(RF)、决策树(DT)、自适应增强随机森林(adaboost-RF)和自适应增强决策树(adaboost-DT))来预测纳米酶的催化类型,如过氧化物酶、氧化酶、过氧化氢酶、超氧化物歧化酶和谷胱甘肽过氧化物酶。此外,我们使用五个定量模型(随机森林(RF)、决策树(DT)、支持向量回归(SVR)、梯度提升回归(GBR)和全连接深度神经网络(DNN))来预测纳米酶的催化活性。我们发现GBR模型在纳米酶催化活性预测方面表现出卓越的性能(Km为0.6476,Kcat为0.95)。此外,还开发了一个基于ChatGPT的纳米酶副驾驶的开放获取网络资源AI-ZYMES,用于预测纳米酶的催化类型和活性,并指导纳米酶的合成。通过检索增强生成,纳米酶副驾驶的响应准确率达到90%以上。本研究为ChatGPT在纳米酶领域提供了新的潜在应用。