Hassan Raouf, Kazemi Mohammad Reza
Civil Engineering Department, College of Engineering, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 13318, Saudi Arabia.
Process Engineering Department, Bandar Imam Petrochemical Company (BIPC), Mahshahr, Iran.
Sci Rep. 2025 Aug 25;15(1):31157. doi: 10.1038/s41598-025-12758-1.
This study intends to effectively forecast solubility parameter of diverse polymers by creating machine learning models that can grasp the complex relationships between essential input factors like molecular weight, melting point, boiling point, liquid molar volume, radius of gyration, dielectric constant, dipole moment, refractive index, van der Waals area and reduced volume, and parachor, alongside the target variable, which is solubility coefficient of polymers. The goal is to create strong models that accurately capture these intricate relationships to facilitate accurate forecasts of the solubility parameter for polymers. Multiple machine learning algorithms, ranging from basic methods like Linear Regression to advanced techniques such as Artificial Neural Networks (ANNs), Ridge Regression, Lasso Regression, Support Vector Machines (SVMs), Linear Regression, Random Forests (RFs), Gradient Boosting Machines (GBM), K-Nearest Neighbors (KNN), Elastic Net, Decision Trees, Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), Convolutional Neural Networks (CNNs), and Extreme Gradient Boosting (XGBoost) were utilized. These methods were utilized to create data-driven models that adeptly seize the intricate connections between input characteristics and output variable, facilitating precise predictions of the solubility parameter for polymers. The efficacy of the developed models was rigorously evaluated using statistical metrics such as R², RMSE, and MRD%, along with visual tools including cross-plots, deviation plots, and SHAP analysis to enhance interpretability and predictive reliability. To guarantee the dataset's reliability, consisting of 1,799 datapoints on the solubility parameter of polymers, the Monte Carlo outlier detection algorithm was utilized. This stage verified the dataset's accuracy and appropriateness for model training and evaluation. Results indicated that the models CatBoost, ANN, and CNN surpassed other techniques, attaining superior accuracy shown by the highest R-squared values and the lowest error rates. Sensitivity analysis showed that every input feature impacted the target variable, while SHAP analysis determined that dielectric constant was the most significant factor influencing the solubility parameter of polymers. These results highlight the efficiency of the utilized machine learning methods and emphasize the vital importance of these input parameters in establishing the solubility parameter of polymers. This method not only verifies that the models can make accurate predictions but also provides valuable insights into the impact of input features on solubility parameters of polymers, enhancing algorithm interpretability and scientific understanding.
本研究旨在通过创建机器学习模型来有效预测各种聚合物的溶解度参数,这些模型能够把握诸如分子量、熔点、沸点、液体摩尔体积、回转半径、介电常数、偶极矩、折射率、范德华面积和折合体积以及等张比容等基本输入因素与作为目标变量的聚合物溶解度系数之间的复杂关系。目标是创建强大的模型,准确捕捉这些复杂关系,以便准确预测聚合物的溶解度参数。使用了多种机器学习算法,从诸如线性回归等基本方法到诸如人工神经网络(ANN)、岭回归、套索回归、支持向量机(SVM)、线性回归、随机森林(RF)、梯度提升机(GBM)、K近邻(KNN)、弹性网络、决策树、轻量级梯度提升机(LightGBM)、分类提升(CatBoost)、卷积神经网络(CNN)和极端梯度提升(XGBoost)等先进技术。这些方法被用于创建数据驱动的模型,该模型能够巧妙地抓住输入特征与输出变量之间的复杂联系,从而便于对聚合物的溶解度参数进行精确预测。使用诸如R²、RMSE和MRD%等统计指标以及包括交叉图、偏差图和SHAP分析等可视化工具对所开发模型的有效性进行了严格评估,以增强可解释性和预测可靠性。为了确保由1799个聚合物溶解度参数数据点组成的数据集的可靠性,使用了蒙特卡罗异常值检测算法。这一阶段验证了数据集对于模型训练和评估的准确性和适用性。结果表明,CatBoost、ANN和CNN模型优于其他技术,通过最高的R平方值和最低的错误率显示出卓越的准确性。敏感性分析表明,每个输入特征都会影响目标变量,而SHAP分析确定介电常数是影响聚合物溶解度参数的最重要因素。这些结果突出了所使用的机器学习方法的效率,并强调了这些输入参数在确定聚合物溶解度参数方面的至关重要性。该方法不仅验证了模型能够做出准确预测,还提供了关于输入特征对聚合物溶解度参数影响的有价值见解,增强了算法的可解释性和科学理解。