机器学习混合方法预测烃基表面活性剂在水溶液中的表面张力曲线。

Machine learning hybrid approach for the prediction of surface tension profiles of hydrocarbon surfactants in aqueous solution.

机构信息

Department of Chemical Engineering, Imperial College London, London SW7 2AZ, United Kingdom.

出版信息

J Colloid Interface Sci. 2022 Nov;625:328-339. doi: 10.1016/j.jcis.2022.06.034. Epub 2022 Jun 9.

DOI:10.1016/j.jcis.2022.06.034

PMID:35717847

Abstract

HYPOTHESIS

Predicting the surface tension (SFT)-log(c) profiles of hydrocarbon surfactants in aqueous solution is computationally non-trivial, and empirically challenging due to the diverse and complex architecture and interactions of surfactant molecules. Machine learning (ML), combining a data-based and knowledge-based approach, can provide a powerful means to relate molecular descriptors to SFT profiles.

EXPERIMENTS

A dataset of SFT for 154 model hydrocarbon surfactants at 20-30 °C is fitted to the Szyszkowski equation to extract three characteristic parameters (Γ,K and critical micelle concentration (CMC)) which are correlated to a series of 2D and 3D molecular descriptors. Key (∼10) descriptors were selected by removing co-correlation, and employing a gradient-boosted regressor model to rank feature importance and carry out recursive feature elimination (RFE). The hyperparameters of each target-variable model were fine-tuned using a randomised cross-validated grid search, to improve predictive ability and reduce overfitting.

FINDINGS

The ML models correlate favourably with test experimental data, with R= 0.69-0.87, and the merits and limitations of the approach are discussed based on 'unseen' hydrocarbon surfactants. The incorporation of a knowledge-based framework provides an appropriate smoothing of the experimental data which simplifies the data-driven approach and enhances its generality. Open-source codes and a brief tutorial are provided.

摘要

假设

预测烃基表面活性剂在水溶液中的表面张力（SFT）-log(c)曲线在计算上是复杂的，并且由于表面活性剂分子的多样和复杂结构和相互作用，在经验上也具有挑战性。机器学习（ML）结合基于数据和基于知识的方法，可以为将分子描述符与 SFT 曲线相关联提供强大的手段。

实验

在 20-30°C 下拟合了 154 种模型烃基表面活性剂的 SFT 数据集，以 Szyszkowski 方程提取三个特征参数（Γ、K 和临界胶束浓度（CMC）），这些参数与一系列 2D 和 3D 分子描述符相关联。通过去除共相关性，并采用梯度提升回归器模型对特征重要性进行排序和递归特征消除（RFE），选择了关键（约 10）描述符。通过随机交叉验证网格搜索精细调整每个目标变量模型的超参数，以提高预测能力并减少过拟合。

结果

ML 模型与实验测试数据的相关性较好，R=0.69-0.87，并基于“未见”烃基表面活性剂讨论了该方法的优缺点。基于知识的框架的纳入为实验数据提供了适当的平滑处理，简化了数据驱动方法并增强了其通用性。提供了开源代码和简短的教程。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

机器学习混合方法预测烃基表面活性剂在水溶液中的表面张力曲线。

Machine learning hybrid approach for the prediction of surface tension profiles of hydrocarbon surfactants in aqueous solution.

机构信息

出版信息

HYPOTHESIS

EXPERIMENTS

FINDINGS

假设

实验

结果

相似文献

引用本文的文献

机器学习混合方法预测烃基表面活性剂在水溶液中的表面张力曲线。

Machine learning hybrid approach for the prediction of surface tension profiles of hydrocarbon surfactants in aqueous solution.

机构信息

出版信息

HYPOTHESIS

EXPERIMENTS

FINDINGS

假设

实验

结果

相似文献

引用本文的文献