Suppr超能文献

基于机器学习的药物化合物与聚合物相互作用分析,以评估药物在制剂中的溶解度。

Machine learning-based analysis on pharmaceutical compounds interaction with polymer to estimate drug solubility in formulations.

作者信息

Obaidullah Ahmad J, Mahdi Wael A

机构信息

Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh, 11451, Saudi Arabia.

Department of Pharmaceutics, College of Pharmacy, King Saud University, P.O. Box 2457, Riyadh, 11451, Saudi Arabia.

出版信息

Sci Rep. 2025 Jul 2;15(1):23683. doi: 10.1038/s41598-025-05535-7.

Abstract

This study introduces a sophisticated predictive framework for determining drug solubility and activity values in formulations via machine learning. The framework utilizes a comprehensive dataset consisting of more than 12,000 data rows and 24 input features containing a wide range of parameters to estimate drug solubility in formulation. The primary goal is to improve the accuracy of predictions by using ensemble learning techniques. Three base models were evaluated including: Decision Tree (DT), K-Nearest Neighbors (KNN), and Multilayer Perceptron (MLP), which are subsequently improved with the AdaBoost ensemble method. To further optimize performance, Recursive Feature Elimination (RFE) is employed for feature selection with the number of features treated as a hyperparameter. Hyperparameter tuning is rigorously conducted utilizing the Harmony Search (HS) algorithm. For drug solubility prediction, the ADA-DT model demonstrates superior performance, achieving an R² score of 0.9738 on the test set, with a Mean Squared Error (MSE) of 5.4270E-04 and a Mean Absolute Error (MAE) of 2.10921E-02. For gamma prediction, the ADA-KNN model outperforms other models, with an R² value of 0.9545 on the test set, an MSE of 4.5908E-03, and a MAE of 1.42730E-02. The results show that ensemble learning with advanced feature selection and hyperparameter optimization can accurately predict complex biochemical properties.

摘要

本研究引入了一个复杂的预测框架,通过机器学习来确定制剂中的药物溶解度和活性值。该框架利用一个包含超过12000个数据行和24个输入特征的综合数据集,这些特征包含广泛的参数,以估计制剂中的药物溶解度。主要目标是通过使用集成学习技术提高预测的准确性。评估了三个基础模型,包括:决策树(DT)、K近邻(KNN)和多层感知器(MLP),随后使用AdaBoost集成方法对其进行改进。为了进一步优化性能,采用递归特征消除(RFE)进行特征选择,将特征数量作为超参数。利用和声搜索(HS)算法严格进行超参数调整。对于药物溶解度预测,ADA-DT模型表现出卓越的性能,在测试集上的R²得分为0.9738,均方误差(MSE)为5.4270E-04,平均绝对误差(MAE)为2.10921E-02。对于γ预测,ADA-KNN模型优于其他模型,在测试集上的R²值为0.9545,MSE为4.5908E-03,MAE为1.42730E-02。结果表明,结合先进的特征选择和超参数优化的集成学习可以准确预测复杂的生化特性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验