Duan Chenru, Chu Daniel B K, Nandy Aditya, Kulik Heather J
Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA 02139 USA
Department of Chemistry, Massachusetts Institute of Technology Cambridge MA 02139 USA.
Chem Sci. 2022 Apr 5;13(17):4962-4971. doi: 10.1039/d2sc00393g. eCollection 2022 May 4.
Appropriately identifying and treating molecules and materials with significant multi-reference (MR) character is crucial for achieving high data fidelity in virtual high-throughput screening (VHTS). Despite development of numerous MR diagnostics, the extent to which a single value of such a diagnostic indicates the MR effect on a chemical property prediction is not well established. We evaluate MR diagnostics for over 10 000 transition-metal complexes (TMCs) and compare to those for organic molecules. We observe that only some MR diagnostics are transferable from one chemical space to another. By studying the influence of MR character on chemical properties (, MR effect) that involve multiple potential energy surfaces (, adiabatic spin splitting, Δ , and ionization potential, IP), we show that differences in MR character are more important than the cumulative degree of MR character in predicting the magnitude of an MR effect. Motivated by this observation, we build transfer learning models to predict CCSD(T)-level adiabatic Δ and IP from lower levels of theory. By combining these models with uncertainty quantification and multi-level modeling, we introduce a multi-pronged strategy that accelerates data acquisition by at least a factor of three while achieving coupled cluster accuracy (, to within 1 kcal mol MAE) for robust VHTS.
在虚拟高通量筛选(VHTS)中,准确识别和处理具有显著多参考(MR)特征的分子和材料对于实现高数据保真度至关重要。尽管已经开发了众多的MR诊断方法,但单一诊断值在多大程度上表明MR对化学性质预测的影响尚未得到充分确立。我们评估了超过10000种过渡金属配合物(TMC)的MR诊断方法,并与有机分子的进行了比较。我们观察到,只有一些MR诊断方法可以从一个化学空间转移到另一个化学空间。通过研究MR特征对涉及多个势能面(如绝热自旋分裂、Δ和电离势,IP)的化学性质(即MR效应)的影响,我们表明,在预测MR效应的大小时,MR特征的差异比MR特征的累积程度更重要。基于这一观察结果,我们构建了迁移学习模型,以从较低理论水平预测CCSD(T)级绝热Δ和IP。通过将这些模型与不确定性量化和多水平建模相结合,我们引入了一种多管齐下的策略,该策略可将数据采集速度至少提高三倍,同时在稳健的VHTS中实现耦合簇精度(即MAE在1 kcal mol以内)。