Janet Jon Paul, Ramesh Sahasrajit, Duan Chenru, Kulik Heather J
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.
ACS Cent Sci. 2020 Apr 22;6(4):513-524. doi: 10.1021/acscentsci.0c00026. Epub 2020 Mar 11.
The accelerated discovery of materials for real world applications requires the achievement of multiple design objectives. The multidimensional nature of the search necessitates exploration of multimillion compound libraries over which even density functional theory (DFT) screening is intractable. Machine learning (e.g., artificial neural network, ANN, or Gaussian process, GP) models for this task are limited by training data availability and predictive uncertainty quantification (UQ). We overcome such limitations by using efficient global optimization (EGO) with the multidimensional expected improvement (EI) criterion. EGO balances exploitation of a trained model with acquisition of new DFT data at the Pareto front, the region of chemical space that contains the optimal trade-off between multiple design criteria. We demonstrate this approach for the simultaneous optimization of redox potential and solubility in candidate M(II)/M(III) redox couples for redox flow batteries from a space of 2.8 M transition metal complexes designed for stability in practical redox flow battery (RFB) applications. We show that a multitask ANN with latent-distance-based UQ surpasses the generalization performance of a GP in this space. With this approach, ANN prediction and EI scoring of the full space are achieved in minutes. Starting from ca. 100 representative points, EGO improves both properties by over 3 standard deviations in only five generations. Analysis of lookahead errors confirms rapid ANN model improvement during the EGO process, achieving suitable accuracy for predictive design in the space of transition metal complexes. The ANN-driven EI approach achieves at least 500-fold acceleration over random search, identifying a Pareto-optimal design in around 5 weeks instead of 50 years.
加速发现适用于现实世界应用的材料需要实现多个设计目标。搜索的多维性质使得有必要探索数百万种化合物库,即使是密度泛函理论(DFT)筛选在这些库上也难以处理。用于此任务的机器学习模型(例如人工神经网络,ANN,或高斯过程,GP)受到训练数据可用性和预测不确定性量化(UQ)的限制。我们通过使用具有多维预期改进(EI)标准的高效全局优化(EGO)来克服这些限制。EGO在利用训练模型与在帕累托前沿获取新的DFT数据之间取得平衡,帕累托前沿是化学空间中包含多个设计标准之间最佳权衡的区域。我们展示了这种方法,用于从为实际氧化还原液流电池(RFB)应用中的稳定性而设计的280万个过渡金属配合物空间中,同时优化氧化还原液流电池候选M(II)/M(III)氧化还原对中的氧化还原电位和溶解度。我们表明,具有基于潜在距离的UQ的多任务ANN在这个空间中超过了GP的泛化性能。通过这种方法,几分钟内就能实现对整个空间的ANN预测和EI评分。从大约100个代表性点开始,EGO仅在五代内就将两种性能提高了超过3个标准差。对前瞻误差的分析证实了在EGO过程中ANN模型的快速改进,在过渡金属配合物空间中实现了适合预测设计的精度。基于ANN的EI方法比随机搜索至少加速了500倍,在大约5周而不是50年内识别出帕累托最优设计。