Duan Chenru, Liu Fang, Nandy Aditya, Kulik Heather J
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
J Phys Chem Lett. 2021 May 20;12(19):4628-4637. doi: 10.1021/acs.jpclett.1c00631. Epub 2021 May 11.
Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
利用机器学习(ML)加速发现已开始带来克服计算材料设计组合挑战所需的效率提升。然而,ML加速发现既继承了源自密度泛函理论(DFT)的训练数据偏差,又导致许多注定会失败的计算尝试。许多引人注目的功能材料和催化过程涉及应变化学键、开壳层自由基和双自由基,或与开壳层过渡金属中心的金属有机键。尽管是有前景的目标,但这些材料对电子结构方法提出了独特挑战,对其发现也带来了组合挑战。在这篇综述中,我们描述了在准确性、效率和方法方面超越传统基于DFT的ML工作流程常规要求所需的进展。通过训练来预测多种方法的结果或它们之间差异的ML模型,这些挑战已开始得到解决,从而实现定量敏感性分析。为了在高通量筛选中让给定数据点的DFT值得信赖,它必须通过一系列测试。预测计算成功可能性并检测强相关性存在的ML模型将实现快速诊断和适应策略。这些“决策引擎”代表了迈向自主工作流程的第一步,避免了专家对基于DFT的材料发现稳健性进行判断的需求。