Department of Mechanical Engineering, University of California Merced, 5200 North Lake Road, Merced, California 95343, United States.
J Chem Inf Model. 2024 Apr 8;64(7):2760-2774. doi: 10.1021/acs.jcim.3c00231. Epub 2023 Aug 15.
Machine learning-based predictive models allow rapid and reliable prediction of material properties and facilitate innovative materials design. Base oils used in the formulation of lubricant products are complex hydrocarbons of varying sizes and structure. This study developed Gaussian process regression-based models to accurately predict the temperature-dependent density and dynamic viscosity of 305 complex hydrocarbons. In our approach, strongly correlated/collinear predictors were trimmed, important predictors were selected by least absolute shrinkage and selection operator (LASSO) regularization and prior domain knowledge, hyperparameters were systematically optimized by Bayesian optimization, and the models were interpreted. The approach provided versatile and quantitative structure-property relationship (QSPR) models with relatively simple predictors for determining the dynamic viscosity and density of complex hydrocarbons at any temperature. In addition, we developed molecular dynamics simulation-based descriptors and evaluated the feasibility and versatility of dynamic descriptors from simulations for predicting the material properties. It was found that the models developed using a comparably smaller pool of dynamic descriptors performed similarly in predicting density and viscosity to models based on many more static descriptors. The best models were shown to predict density and dynamic viscosity with coefficient of determination () values of 99.6% and 97.7%, respectively, for all data sets, including a test data set of 45 molecules. Finally, partial dependency plots (PDPs), individual conditional expectation (ICE) plots, local interpretable model-agnostic explanation (LIME) values, and trimmed model values were used to identify the most important static and dynamic predictors of the density and viscosity.
基于机器学习的预测模型可以快速、可靠地预测材料性能,从而促进创新材料的设计。用于配制润滑剂产品的基础油是具有不同大小和结构的复杂烃类。本研究开发了基于高斯过程回归的模型,以准确预测 305 种复杂烃类的温度相关密度和动态粘度。在我们的方法中,修剪了强相关/共线性预测因子,通过最小绝对收缩和选择算子(LASSO)正则化和先验领域知识选择重要预测因子,通过贝叶斯优化系统地优化超参数,并对模型进行解释。该方法提供了多功能和定量结构-性质关系(QSPR)模型,这些模型具有相对简单的预测因子,可用于确定任何温度下复杂烃类的动态粘度和密度。此外,我们开发了基于分子动力学模拟的描述符,并评估了来自模拟的动态描述符预测材料性质的可行性和多功能性。结果发现,使用相对较小的动态描述符池开发的模型在预测密度和粘度方面与基于更多静态描述符的模型表现相似。结果表明,最佳模型能够预测所有数据集(包括 45 个分子的测试数据集)的密度和动态粘度,决定系数(R2)值分别为 99.6%和 97.7%。最后,使用偏依赖图(PDP)、个体条件期望(ICE)图、局部可解释模型不可知解释(LIME)值和修剪后的模型 R2 值来识别密度和粘度的最重要的静态和动态预测因子。