Marques Esteban, de Gendt Stefan, Pourtois Geoffrey, van Setten Michiel J
Department of Chemistry, KU Leuven (University of Leuven), Celestijnenlaan 200 F, Heverlee 3001, Belgium.
IMEC, Kapeldreef 75, Leuven 3001, Belgium.
J Chem Inf Model. 2023 Mar 13;63(5):1454-1461. doi: 10.1021/acs.jcim.2c01502. Epub 2023 Mar 2.
Predicting chemical activation energies is one of the longstanding and important challenges in computational chemistry. Recent advances have shown that machine learning can be used to create tools to predict them. Such tools can significantly decrease the computational cost for these predictions compared to traditional methods, which require an optimal path search along a high-dimensional potential energy surface. To enable this new route, we need both large and accurate datasets and a compact yet complete description of the reactions. Although data for chemical reactions is becoming increasingly available, the key step of encoding the reaction as an efficient descriptor remains a big challenge. In this paper, we demonstrate that including electronic energy levels in the description of the reaction significantly improves the prediction accuracy and transferability. Feature importance analysis further demonstrates that electronic energy levels have a higher importance than some structural information and typically require less space in the reaction encoding vector. In general, we observe that the results of the feature importance analysis relate well to the domain knowledge of fundamental chemical principles. This work can help to build better chemical reaction encodings for machine learning and thus improve the predictions of machine learning models for reaction activation energies. These models could ultimately be used to recognize reaction limiting steps in large reaction systems, allowing to account for bottlenecks at the design stage.
预测化学活化能是计算化学中长期存在的重要挑战之一。最近的进展表明,机器学习可用于创建预测活化能的工具。与传统方法相比,此类工具可显著降低这些预测的计算成本,传统方法需要沿着高维势能面进行最优路径搜索。为实现这一新途径,我们既需要大量且准确的数据集,也需要对反应进行紧凑而完整的描述。尽管化学反应数据越来越容易获取,但将反应编码为有效描述符的关键步骤仍然是一个巨大挑战。在本文中,我们证明在反应描述中纳入电子能级可显著提高预测准确性和可转移性。特征重要性分析进一步表明,电子能级比某些结构信息具有更高的重要性,并且在反应编码向量中通常需要更少的空间。总体而言,我们观察到特征重要性分析的结果与基础化学原理的领域知识密切相关。这项工作有助于为机器学习构建更好的化学反应编码,从而改进机器学习模型对反应活化能的预测。这些模型最终可用于识别大型反应系统中的反应限速步骤,从而在设计阶段考虑瓶颈问题。