Suppr超能文献

基于带有分子指纹和布谷鸟搜索算法的轻梯度提升机预测化合物的水溶性

Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm.

作者信息

Li Mengshan, Chen Huijie, Zhang Hang, Zeng Ming, Chen Bingsheng, Guan Lixin

机构信息

College of Physics and Electronic Information, Gannan Normal University, Ganzhou341000, Jiangxi, China.

出版信息

ACS Omega. 2022 Nov 8;7(46):42027-42035. doi: 10.1021/acsomega.2c03885. eCollection 2022 Nov 22.

Abstract

Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.

摘要

水溶性是药物研发中最重要的物理化学性质之一。目前,化合物水溶性的预测仍然是一个具有挑战性的问题。机器学习在溶解度预测方面显示出巨大潜力。大多数机器学习模型在很大程度上依赖于超参数的设置,通过更好地设置超参数可以提高其性能。在本文中,我们使用MACCS指纹来表示结构特征,并使用布谷鸟搜索算法(CS)优化了轻梯度提升机(LightGBM)的超参数。基于上述表示和优化,建立了CS-LightGBM模型来预测2446种有机化合物的水溶性,并将获得的预测结果与其他六种不同的机器学习模型(RF、GBDT、XGBoost、LightGBM、SVR和BO-LightGBM)的预测结果进行比较。比较结果表明,CS-LightGBM模型的预测性能优于其他六种不同的模型。CS-LightGBM模型的RMSE、MAE和 分别为0.7785、0.5117和0.8575。此外,该模型具有良好的可扩展性,可用于解决溶剂选择和药物筛选等其他领域的溶解度预测问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dec5/9685740/1763ff672578/ao2c03885_0002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验