J Phys Chem A. 2019 Sep 26;123(38):8305-8313. doi: 10.1021/acs.jpca.9b04771. Epub 2019 Sep 16.
Thermodynamic properites of molecules are used widely in the study of reactive processes. Such properties are typically measured via experiments or calculated by a variety of computational chemistry methods. In this work, machine learning (ML) models for estimation of standard enthalpy of formation at 298.15 K are developed for three classes of acyclic and closed-shell hydrocarbons, viz. alkanes, alkenes, and alkynes. Initially, an extensive literature survey is performed to collect standard enthalpy data for training ML models. A commercial software (Dragon) is used to obtain a wide set of molecular descriptors by providing SMILES strings. The molecular descriptors are used as input features for the ML models. Support vector regression (SVR) and artificial neural networks are used with a two-level K-fold cross-validation (K-fold CV) workflow. The first level is for estimation of accuracy of both the ML models, and the second level is for generation of the final models. The SVR model is selected as the best model based on error estimates over 10-fold CV. The final SVR model is compared against conventional Benson's group additivity for a set of octene isomers from the database, illustrating the advantages of the proposed ML modeling approach.
分子的热力学性质在反应过程的研究中得到了广泛的应用。这些性质通常通过实验测量或通过各种计算化学方法计算。在这项工作中,为三类无环和闭壳烃,即烷烃、烯烃和炔烃,开发了用于估算 298.15 K 标准生成焓的机器学习 (ML) 模型。最初,进行了广泛的文献调查,以收集用于训练 ML 模型的标准焓数据。商业软件 (Dragon) 用于通过提供 SMILES 字符串获得广泛的分子描述符集。分子描述符用作 ML 模型的输入特征。支持向量回归 (SVR) 和人工神经网络与两级 K 折交叉验证 (K-fold CV) 工作流程一起使用。第一级用于估计两种 ML 模型的准确性,第二级用于生成最终模型。基于 10 倍 CV 的误差估计,选择 SVR 模型作为最佳模型。最后,将最终的 SVR 模型与传统的 Benson 基团加性方法进行比较,以说明所提出的 ML 建模方法的优势。