Vinod Vivin, Maity Sayan, Zaspel Peter, Kleinekathöfer Ulrich
School of Mathematics and Natural Science, University of Wuppertal, Wuppertal 42119, Germany.
School of Computer Science and Engineering, Constructor University, Campus Ring 1, Bremen 28759, Germany.
J Chem Theory Comput. 2023 Nov 14;19(21):7658-7670. doi: 10.1021/acs.jctc.3c00882. Epub 2023 Oct 20.
The accurate but fast calculation of molecular excited states is still a very challenging topic. For many applications, detailed knowledge of the energy funnel in larger molecular aggregates is of key importance, requiring highly accurate excitation energies. To this end, machine learning techniques can be a very useful tool, though the cost of generating highly accurate training data sets still remains a severe challenge. To overcome this hurdle, this work proposes the use of multifidelity machine learning where very little training data from high accuracies is combined with cheaper and less accurate data to achieve the accuracy of the costlier level. In the present study, the approach is employed to predict vertical excitation energies to the first excited state for three molecules of increasing size, namely, benzene, naphthalene, and anthracene. The energies are trained and tested for conformations stemming from classical molecular dynamics and density functional based tight-binding simulations. It can be shown that the multifidelity machine learning model can achieve the same accuracy as a machine learning model built only on high-cost training data while expending a much lower computational effort to generate the data. The numerical gain observed in these benchmark test calculations was over a factor of 30 but certainly can be much higher for high-accuracy data.
准确而快速地计算分子激发态仍然是一个极具挑战性的课题。对于许多应用而言,深入了解更大分子聚集体中的能量漏斗至关重要,这需要高精度的激发能。为此,机器学习技术可能是一个非常有用的工具,不过生成高精度训练数据集的成本仍然是一个严峻的挑战。为克服这一障碍,本工作提出使用多保真度机器学习,即将来自高精度的极少训练数据与成本较低且精度较低的数据相结合,以达到更高成本水平的精度。在本研究中,该方法被用于预测三种尺寸不断增大的分子(即苯、萘和蒽)到第一激发态的垂直激发能。对源自经典分子动力学和基于密度泛函的紧束缚模拟的构象的能量进行了训练和测试。结果表明,多保真度机器学习模型能够达到仅基于高成本训练数据构建的机器学习模型相同的精度,同时在生成数据时耗费的计算量要低得多。在这些基准测试计算中观察到的数值增益超过30倍,但对于高精度数据肯定可以更高。