Aniceto José P S, Zêzere Bruno, Silva Carlos M
CICECO-Aveiro Institute of Materials, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal.
Materials (Basel). 2021 Jan 23;14(3):542. doi: 10.3390/ma14030542.
Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, -nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.
实验扩散系数很难获得,尽管了解它们对于模拟速率控制过程至关重要。在这项工作中,开发了各种机器学习模型来估计极性和非极性溶剂(水和超临界CO除外)中的扩散系数。这些模型在一个包含90个极性体系(1431个数据点)和154个非极性体系(1129个数据点)的数据库上进行训练,该数据库包含20种性质的数据。评估了五种机器学习算法:多元线性回归、k近邻、决策树和两种集成方法(随机森林和梯度提升)。对于极性和非极性数据,使用梯度提升算法获得了最佳结果。极性体系模型包含6个变量/参数(温度、溶剂粘度、溶质摩尔质量、溶质临界压力、溶剂摩尔质量和溶剂伦纳德-琼斯能量常数),平均偏差(AARD)为5.07%。非极性模型需要5个变量/参数(与极性体系相同,除了伦纳德-琼斯常数),AARD = 5.86%。将这些结果与四个经典模型进行了比较,包括Magalhães等人的双参数关联式(极性/非极性的AARD = 5.19/6.19%)和预测性的威尔克-张方程(AARD = 40.92/29.19%)。然而,Magalhães等人的方法每个体系需要两个必须事先根据数据拟合的参数。所开发的模型已编码并作为命令行程序提供。