Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
J Chem Phys. 2022 Feb 21;156(7):074101. doi: 10.1063/5.0082964.
Strategies for machine-learning (ML)-accelerated discovery that are general across material composition spaces are essential, but demonstrations of ML have been primarily limited to narrow composition variations. By addressing the scarcity of data in promising regions of chemical space for challenging targets such as open-shell transition-metal complexes, general representations and transferable ML models that leverage known relationships in existing data will accelerate discovery. Over a large set (∼1000) of isovalent transition-metal complexes, we quantify evident relationships for different properties (i.e., spin-splitting and ligand dissociation) between rows of the Periodic Table (i.e., 3d/4d metals and 2p/3p ligands). We demonstrate an extension to the graph-based revised autocorrelation (RAC) representation (i.e., eRAC) that incorporates the group number alongside the nuclear charge heuristic that otherwise overestimates dissimilarity of isovalent complexes. To address the common challenge of discovery in a new space where data are limited, we introduce a transfer learning approach in which we seed models trained on a large amount of data from one row of the Periodic Table with a small number of data points from the additional row. We demonstrate the synergistic value of the eRACs alongside this transfer learning strategy to consistently improve model performance. Analysis of these models highlights how the approach succeeds by reordering the distances between complexes to be more consistent with the Periodic Table, a property we expect to be broadly useful for other material domains.
机器学习(ML)加速发现策略在材料组成空间中具有普遍性,但 ML 的演示主要局限于狭窄的组成变化。通过解决在具有挑战性的目标(如开壳过渡金属配合物)的化学空间中数据稀缺的问题,利用现有数据中已知关系的通用表示和可转移的 ML 模型将加速发现。在一大组(约 1000 个)等电子过渡金属配合物中,我们量化了元素周期表行之间不同性质(即自旋分裂和配体解离)之间明显的关系(即 3d/4d 金属和 2p/3p 配体)。我们展示了一种扩展到基于图的修订自相关(RAC)表示(即 eRAC)的方法,该方法将基团数与核电荷启发式结合在一起,否则会高估等电子配合物的相似性。为了解决在数据有限的新空间中发现的常见挑战,我们引入了一种迁移学习方法,其中我们使用来自元素周期表一行的大量数据来对模型进行种子训练,然后使用来自另一行的少量数据点。我们展示了 eRAC 与这种迁移学习策略的协同价值,可以一致地提高模型性能。对这些模型的分析强调了该方法是如何通过重新排列复合物之间的距离使其更符合元素周期表来成功的,我们预计这种特性对于其他材料领域也具有广泛的用途。