Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Daejeon 34141, Republic of Korea.
Saudi Aramco-KAIST CO2 Management CenterKorea Advanced Institute of Science and Technology (KAIST)291 Daehak-ro, Daejeon 34141, Republic of Korea.
J Chem Inf Model. 2020 Apr 27;60(4):1996-2003. doi: 10.1021/acs.jcim.0c00003. Epub 2020 Apr 6.
Computational high throughput screening (HTS) has emerged as a significant tool in material science to accelerate the discovery of new materials with target properties in recent years. However, despite many successful cases in which HTS led to the novel discovery, currently, the major bottleneck in HTS is a large computational cost of density functional theory (DFT) calculations that scale cubically with system size, limiting the chemical space that can be explored. The present work aims at addressing this computational burden of HTS by presenting a machine learning (ML) framework that can efficiently explore the chemical space. Our model is built upon an existing crystal graph convolutional neural network (CGCNN) to obtain formation energy of a crystal structure but is modified to allow uncertainty quantification for each prediction using the hyperbolic tangent activation function and dropout algorithm (CGCNN-HD). The uncertainty quantification is particularly important since typical usage of CGCNN (due to the lack of gradient implementation) does not involve structural relaxation which could cause substantial prediction errors. The proposed method is benchmarked against an existing application that identified promising photoanode material among the >7,000 hypothetical Mg-Mn-O ternary compounds using all DFT-HTS. In our approach, we perform the approximate HTS using CGCNN-HD and refine the results using full DFT for those selected (denoted as ML/DFT-HTS). The proposed hybrid model reduces the required DFT calculations by a factor of >50 compared to the previous DFT-HTS in making the same discovery of MgMnO, experimentally validated new photoanode material. Further analysis demonstrates that the addition of HD components with uncertainty measures in the CGCNN-HD model increased the of promising materials relative to all DFT-HTS from 30% (CGCNN) to 68% (CGCNN-HD). The present ML/DFT-HTS with uncertainty quantification can thus be a fast alternative to DFT-HTS for efficient exploration of the vast chemical space.
近年来,计算高通量筛选 (HTS) 已成为材料科学中加速发现具有目标性质的新材料的重要工具。然而,尽管在许多成功的案例中,HTS 导致了新的发现,但目前,HTS 的主要瓶颈是密度泛函理论 (DFT) 计算的计算成本很高,其规模与系统大小成三次方关系,限制了可以探索的化学空间。本工作旨在通过提出一种可以有效探索化学空间的机器学习 (ML) 框架来解决 HTS 的计算负担。我们的模型是基于现有的晶体图卷积神经网络 (CGCNN) 构建的,用于获得晶体结构的形成能,但经过修改,允许使用双曲正切激活函数和随机失活算法 (CGCNN-HD) 对每个预测进行不确定性量化。不确定性量化尤为重要,因为 CGCNN 的典型用法(由于缺乏梯度实现)不涉及结构弛豫,这可能会导致大量预测误差。该方法与现有的应用程序进行了基准测试,该应用程序使用所有 DFT-HTS 从超过 7000 种假设的 Mg-Mn-O 三元化合物中识别出有前途的光阳极材料。在我们的方法中,我们使用 CGCNN-HD 进行近似 HTS,并使用全 DFT 对那些被选中的进行细化(表示为 ML/DFT-HTS)。与之前的 DFT-HTS 相比,该混合模型在做出相同的 MgMnO 发现(经过实验验证的新型光阳极材料)时,将所需的 DFT 计算减少了 >50 倍。进一步的分析表明,在 CGCNN-HD 模型中添加具有不确定性度量的 HD 成分,使得具有潜在前景的材料的比例相对于所有 DFT-HTS 从 30%(CGCNN)增加到 68%(CGCNN-HD)。因此,具有不确定性量化的本 ML/DFT-HTS 可以成为 DFT-HTS 的快速替代方案,用于有效探索广阔的化学空间。