Suppr超能文献

基于有限数据集预测有机分子吸附性能的通用机器学习算法:特征描述的重要性

Universal machine-learning algorithm for predicting adsorption performance of organic molecules based on limited data set: Importance of feature description.

作者信息

Huang Chaoyi, Gao Wenyang, Zheng Yingdie, Wang Wei, Zhang Yue, Liu Kai

机构信息

Division of Environment and Resources, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China.

Division of Artificial Intelligence and Data Science, College of Engineering, Westlake University, Hangzhou, Zhejiang 310024, China.

出版信息

Sci Total Environ. 2023 Feb 10;859(Pt 1):160228. doi: 10.1016/j.scitotenv.2022.160228. Epub 2022 Nov 17.

Abstract

Adsorption of organic molecules from aqueous solution offers a simple and effective method for their removal. Recently, there have been several attempts to apply machine learning (ML) for this problem. To this end, polyparameter linear free energy relationships (pp-LFERs) were employed, and poor prediction results were observed outside model applicability domain of pp-LFERs. In this study, we improved the applicability of ML methods by adopting a chemical-structure (CS) based approach. We used the prediction of adsorption of organic molecules on carbon-based adsorbents as an example. Our results show that this approach can fully differentiate the structural differences between any organic molecules, while providing significant information that is relevant to their interaction with the adsorbents. We compared two CS feature descriptors: 3D-coordination and simplified molecular-input line-entry system (SMILES). We then built CS-ML models based on neural networks (NN) and extreme gradient boosting (XGB). They all outperformed pp-LFERs based models and are capable to accurately predict adsorption isotherm of isomers with similar physiochemical properties such as chiral molecules, even though they are trained with achiral molecules and racemates. We found for predicting adsorption isotherm, XGB shows better performance than NN, and 3D-coordinations allow effective differentiation between organic molecules.

摘要

从水溶液中吸附有机分子为其去除提供了一种简单有效的方法。最近,已经有几次尝试将机器学习(ML)应用于这个问题。为此,采用了多参数线性自由能关系(pp-LFERs),并且在pp-LFERs的模型适用范围之外观察到了较差的预测结果。在本研究中,我们通过采用基于化学结构(CS)的方法提高了ML方法的适用性。我们以预测有机分子在碳基吸附剂上的吸附为例。我们的结果表明,这种方法可以充分区分任何有机分子之间的结构差异,同时提供与其与吸附剂相互作用相关的重要信息。我们比较了两种CS特征描述符:3D配位和简化分子输入线性条目系统(SMILES)。然后,我们基于神经网络(NN)和极端梯度提升(XGB)构建了CS-ML模型。它们都优于基于pp-LFERs的模型,并且能够准确预测具有相似物理化学性质的异构体(如手性分子)的吸附等温线,即使它们是用非手性分子和外消旋体训练的。我们发现,对于预测吸附等温线,XGB比NN表现更好,并且3D配位能够有效区分有机分子。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验