Dhakal Pratik, Gassaway Wyatt, Shah Jindal K
School of Chemical Engineering, Oklahoma State University, Stillwater, Oklahoma 74078, USA.
J Chem Phys. 2023 Aug 14;159(6). doi: 10.1063/5.0155775.
The knowledge of the frontier orbital, highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO), energies is vital for studying chemical and electrochemical stability of compounds, their corrosion inhibition potential, reactivity, etc. Density functional theory (DFT) calculations provide a direct route to estimate these energies either in the gas-phase or condensed phase. However, the application of DFT methods becomes computationally intensive when hundreds of thousands of compounds are to be screened. Such is the case when all the isomers for the 1-alkyl-3-alkylimidazolium cation [CnCmim]+ (n = 1-10, m = 1-10) are considered. Enumerating the isomer space of [CnCmim]+ yields close to 386 000 cation structures. Calculating frontier orbital energies for each would be computationally very expensive and time-consuming using DFT. In this article, we develop a machine learning model based on the extreme gradient boosting method using a small subset of the isomer space and predict the HOMO and LUMO energies. Using the model, the HOMO energies are predicted with a mean absolute error (MAE) of 0.4 eV and the LUMO energies are predicted with a MAE of 0.2 eV. Inferences are also drawn on the type of the descriptors deemed important for the HOMO and LUMO energy estimates. Application of the machine learning model results in a drastic reduction in computational time required for such calculations.
前线轨道、最高占据分子轨道(HOMO)和最低未占据分子轨道(LUMO)的能量知识对于研究化合物的化学和电化学稳定性、缓蚀潜力、反应活性等至关重要。密度泛函理论(DFT)计算提供了一条在气相或凝聚相中估算这些能量的直接途径。然而,当要筛选数十万种化合物时,DFT方法的应用在计算上会变得非常密集。1-烷基-3-烷基咪唑鎓阳离子[CnCmim]+(n = 1 - 10,m = 1 - 10)的所有异构体被考虑时就是这种情况。列举[CnCmim]+的异构体空间会产生接近386000种阳离子结构。使用DFT计算每个结构的前线轨道能量在计算上会非常昂贵且耗时。在本文中,我们基于极端梯度提升方法开发了一个机器学习模型,使用异构体空间的一个小子集来预测HOMO和LUMO能量。使用该模型,预测的HOMO能量平均绝对误差(MAE)为0.4 eV,预测的LUMO能量MAE为0.2 eV。还对对于HOMO和LUMO能量估计被认为重要的描述符类型进行了推断。机器学习模型的应用极大地减少了此类计算所需的计算时间。