Nano-Bio Spectroscopy Group, Departamento de Polímeros y Materiales Avanzados: Fisica, Química y Tecnología, Universidad del País Vasco UPV/EHU, 20018 San Sebastián, Spain.
Laboratorio de Química Computacional y Teórica, Facultad de Química, Universidad de La Habana, 10400 La Habana, Cuba.
J Chem Theory Comput. 2023 Mar 28;19(6):1818-1826. doi: 10.1021/acs.jctc.2c01039. Epub 2023 Mar 6.
Spectroscopic properties of molecules hold great importance for the description of the molecular response under the effect of UV/vis electromagnetic radiation. Computationally expensive (e.g., MultiConfigurational SCF, Coupled Cluster) or TDDFT methods are commonly used by the quantum chemistry community to compute these properties. In this work, we propose a (supervised) Machine Learning approach to model the absorption spectra of organic molecules. Several supervised ML methods have been tested such as Kernel Ridge Regression (KRR), Multiperceptron Neural Networs (MLP), and Convolutional Neural Networks. [Ramakrishnan et al. 2015, 143, 084111. Ghosh et al. 2019, 6, 1801367.] The use of only geometrical-atomic number descriptors (e.g., Coulomb Matrix) proved to be insufficient for an accurate training. [Ramakrishnan et al. 2015, 143, 084111.] Inspired by the TDDFT theory, we propose to use a set of electronic descriptors obtained from low-cost DFT methods: orbital energy differences (Δϵ = ϵ - ϵ), transition dipole moment between occupied and unoccupied Kohn-Sham orbitals (⟨ϕ||ϕ⟩), and when relevant, charge-transfer character of monoexcitations (). We demonstrate that with these electronic descriptors and the use of Neural Networks we can predict not only a density of excited states but also get a very good estimation of the absorption spectrum and charge-transfer character of the electronic excited states, reaching results close to chemical accuracy (∼2 kcal/mol or ∼0.1 eV).
分子的光谱性质对于描述在紫外/可见电磁辐射作用下的分子响应具有重要意义。计算化学界通常使用计算这些性质的计算成本较高的(例如多组态自洽场、耦合簇)或 TDDFT 方法。在这项工作中,我们提出了一种(监督)机器学习方法来模拟有机分子的吸收光谱。已经测试了几种监督机器学习方法,例如核岭回归(KRR)、多层感知器神经网络(MLP)和卷积神经网络。[Ramakrishnan 等人,2015 年,143,084111. Ghosh 等人,2019 年,6,1801367。]仅使用几何原子数描述符(例如库仑矩阵)的方法被证明不足以进行准确的训练。[Ramakrishnan 等人,2015 年,143,084111。]受 TDDFT 理论的启发,我们建议使用一组从低成本 DFT 方法获得的电子描述符:轨道能差(Δϵ=ϵ-ϵ)、占据和非占据 Kohn-Sham 轨道之间的跃迁偶极矩(⟨ϕ||ϕ⟩),以及相关时单激发的电荷转移特性()。我们证明,通过使用这些电子描述符和神经网络,我们不仅可以预测激发态的密度,还可以很好地估计吸收光谱和电子激发态的电荷转移特性,达到接近化学精度(∼2 kcal/mol 或∼0.1 eV)的结果。