Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, Hefei, Anhui230026, China.
J Chem Inf Model. 2023 Feb 13;63(3):806-814. doi: 10.1021/acs.jcim.2c01321. Epub 2023 Jan 22.
Ionization energy (IE) is an important property of molecules. It is highly desirable to predict IE efficiently based on, for example, machine learning (ML)-powered quantitative structure-property relationships (QSPR). In this study, we systematically compare the performance of different machine learning models in predicting the IE of molecules with distinct functional groups obtained from the NIST webbook. Mordred and PaDEL are used to generate informative and computationally inexpensive descriptors for conventional ML models. Using a descriptor to indicate if the molecule is a radical can significantly improve the performance of these ML models. Support vector regression (SVR) is the best conventional ML model for IE prediction. In graph-based models, the AttentiveFP gives an even better performance compared to SVR. The difference between these two types of models mainly comes from their predictions for radical molecules, where the local environment around an unpaired electron is better described by graph-based models. These results provide not only high-performance models for IE prediction but also useful information in choosing models to obtain reliable QSPR.
电离能(IE)是分子的重要性质。基于机器学习(ML)驱动的定量结构-性质关系(QSPR),高效地预测 IE 是非常可取的。在这项研究中,我们系统地比较了不同机器学习模型在预测来自 NIST 网页的具有不同官能团的分子 IE 方面的性能。Mordred 和 PaDEL 用于为传统 ML 模型生成信息丰富且计算成本低廉的描述符。使用描述符来指示分子是否为自由基可以显著提高这些 ML 模型的性能。支持向量回归(SVR)是 IE 预测的最佳传统 ML 模型。在基于图的模型中,与 SVR 相比,AttentiveFP 给出了更好的性能。这两种类型的模型之间的差异主要来自它们对自由基分子的预测,其中未配对电子周围的局部环境可以通过基于图的模型更好地描述。这些结果不仅为 IE 预测提供了高性能模型,而且还为选择模型以获得可靠的 QSPR 提供了有用信息。