Kjeldal Frederik Ø, Eriksen Janus J
DTU Chemistry, Technical University of Denmark Kemitorvet Building 206, 2800 Kongens Lyngby, Denmark.
J Chem Theory Comput. 2023 Apr 11;19(7):2029-2038. doi: 10.1021/acs.jctc.2c01290. Epub 2023 Mar 16.
We apply a number of atomic decomposition schemes across the standard QM7 data set─a small model set of organic molecules at equilibrium geometry─to inspect the possible emergence of trends among contributions to atomization energies from distinct elements embedded within molecules. Specifically, a recent decomposition scheme of ours based on spatially localized molecular orbitals is compared to alternatives that instead partition molecular energies on account of which nuclei individual atomic orbitals are centered on. We find these partitioning schemes to expose the composition of chemical compound space in very dissimilar ways in terms of the grouping, binning, and heterogeneity of discrete atomic contributions, e.g., those associated with hydrogens bonded to different heavy atoms. Furthermore, unphysical dependencies on the one-electron basis set are found for some, but not all of these schemes. The relevance and importance of these compositional factors for training tailored neural network models based on atomic energies are next assessed. We identify both limitations and possible advantages with respect to contemporary machine learning models and discuss the design of potential counterparts based on atoms and the intrinsic energies of these as the principal decomposition units.
我们在标准QM7数据集(一组处于平衡几何构型的有机小分子模型集)上应用了多种原子分解方案,以考察分子中不同元素对原子化能贡献之间可能出现的趋势。具体而言,我们将最近基于空间局域化分子轨道的分解方案与其他方案进行了比较,后者是根据各个原子轨道所围绕的原子核来划分分子能量的。我们发现,这些划分方案在离散原子贡献的分组、分类和异质性方面,以非常不同的方式揭示了化合物空间的组成,例如与键合到不同重原子上的氢相关的那些贡献。此外,我们发现其中一些方案(但不是全部)对单电子基组存在非物理依赖性。接下来评估这些组成因素对于基于原子能量训练定制神经网络模型的相关性和重要性。我们确定了当代机器学习模型的局限性和可能的优势,并讨论了基于原子及其固有能量作为主要分解单元的潜在对应模型的设计。