Kotobi Amir, Singh Kanishka, Höche Daniel, Bari Sadia, Meißner Robert H, Bande Annika
Helmholtz-Zentrum Hereon, Institute of Surface Science, Geesthacht, DE 21502, Germany.
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH, Berlin, DE 10409, Germany.
J Am Chem Soc. 2023 Oct 18;145(41):22584-22598. doi: 10.1021/jacs.3c07513. Epub 2023 Oct 9.
The use of sophisticated machine learning (ML) models, such as graph neural networks (GNNs), to predict complex molecular properties or all kinds of spectra has grown rapidly. However, ensuring the interpretability of these models' predictions remains a challenge. For example, a rigorous understanding of the predicted X-ray absorption spectrum (XAS) generated by such ML models requires an in-depth investigation of the respective black-box ML model used. Here, this is done for different GNNs based on a comprehensive, custom-generated XAS data set for small organic molecules. We show that a thorough analysis of the different ML models with respect to the local and global environments considered in each ML model is essential for the selection of an appropriate ML model that allows a robust XAS prediction. Moreover, we employ feature attribution to determine the respective contributions of various atoms in the molecules to the peaks observed in the XAS spectrum. By comparing this peak assignment to the core and virtual orbitals from the quantum chemical calculations underlying our data set, we demonstrate that it is possible to relate the atomic contributions via these orbitals to the XAS spectrum.
使用复杂的机器学习(ML)模型,如图形神经网络(GNN)来预测复杂的分子性质或各类光谱,其应用发展迅速。然而,确保这些模型预测的可解释性仍然是一项挑战。例如,要严格理解此类ML模型生成的预测X射线吸收光谱(XAS),需要深入研究所使用的相应黑箱ML模型。在此,基于一个针对小有机分子的全面的、自定义生成的XAS数据集,针对不同的GNN进行了此项工作。我们表明,针对每个ML模型中考虑的局部和全局环境,对不同的ML模型进行透彻分析,对于选择能够进行稳健XAS预测的合适ML模型至关重要。此外,我们采用特征归因来确定分子中各个原子对XAS光谱中观察到的峰的各自贡献。通过将此峰归属与我们数据集基础的量子化学计算中的核心轨道和虚拟轨道进行比较,我们证明可以通过这些轨道将原子贡献与XAS光谱联系起来。