Suppr超能文献

利用多任务机器学习预测流体的关键性质和偏心因子。

Predicting Critical Properties and Acentric Factors of Fluids Using Multitask Machine Learning.

机构信息

Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.

出版信息

J Chem Inf Model. 2023 Aug 14;63(15):4574-4588. doi: 10.1021/acs.jcim.3c00546. Epub 2023 Jul 24.

Abstract

Knowledge of critical properties, such as critical temperature, pressure, density, as well as acentric factor, is essential to calculate thermo-physical properties of chemical compounds. Experiments to determine critical properties and acentric factors are expensive and time intensive; therefore, we developed a machine learning (ML) model that can predict these molecular properties given the SMILES representation of a chemical species. We explored directed message passing neural network (D-MPNN) and graph attention network as ML architecture choices. Additionally, we investigated featurization with additional atomic and molecular features, multitask training, and pretraining using estimated data to optimize model performance. Our final model utilizes a D-MPNN layer to learn the molecular representation and is supplemented by Abraham parameters. A multitask training scheme was used to train a single model to predict all the critical properties and acentric factors along with boiling point, melting point, enthalpy of vaporization, and enthalpy of fusion. The model was evaluated on both random and scaffold splits where it shows state-of-the-art accuracies. The extensive data set of critical properties and acentric factors contains 1144 chemical compounds and is made available in the public domain together with the source code that can be used for further exploration.

摘要

临界性质(如临界温度、压力、密度以及偏心因子)的知识对于计算化合物的热物理性质至关重要。实验测定临界性质和偏心因子的成本高且耗时;因此,我们开发了一种机器学习(ML)模型,可基于化学物质的 SMILES 表示来预测这些分子性质。我们探索了有向消息传递神经网络(D-MPNN)和图注意网络作为 ML 架构选择。此外,我们还研究了使用附加原子和分子特征、多任务训练和使用估计数据进行预训练的特征化,以优化模型性能。我们的最终模型利用 D-MPNN 层来学习分子表示,并辅以 Abraham 参数。我们使用多任务训练方案来训练单个模型,以预测所有临界性质和偏心因子,以及沸点、熔点、蒸发热和熔融热。该模型在随机和支架拆分上进行了评估,表现出了最先进的准确性。临界性质和偏心因子的广泛数据集包含 1144 种化学物质,并在公共领域提供,同时提供可用于进一步探索的源代码。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验