College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China.
Key Laboratory of Biomass Chemical Engineering of Ministry of Education, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China.
J Chem Inf Model. 2024 Nov 11;64(21):8131-8141. doi: 10.1021/acs.jcim.4c01358. Epub 2024 Oct 23.
The prediction of the thermodynamic and kinetic properties of elementary reactions has shown rapid improvement due to the implementation of deep learning (DL) methods. While various studies have reported the success in predicting reaction properties, the quantification of prediction uncertainty has seldom been investigated, thus compromising the confidence in using these predicted properties in practical applications. Here, we integrated graph convolutional neural networks (GCNN) with three uncertainty prediction techniques, including deep ensemble, Monte Carlo (MC)-dropout, and evidential learning, to provide insights into the uncertainty quantification and utility. The deep ensemble model outperforms others in accuracy and shows the highest reliability in estimating prediction uncertainty across all elementary reaction property data sets. We also verified that the deep ensemble model showed a satisfactory capability in recognizing epistemic and aleatoric uncertainties. Additionally, we adopted a Monte Carlo Tree Search method for extracting the explainable reaction substructures, providing a chemical explanation for DL predicted properties and corresponding uncertainties. Finally, to demonstrate the utility of uncertainty qualification in practical applications, we performed an uncertainty-guided calibration of the DL-constructed kinetic model, which achieved a 25% higher hit ratio in identifying dominant reaction pathways compared to that of the calibration without uncertainty guidance.
由于深度学习(DL)方法的实施,基本反应的热力学和动力学性质的预测已经取得了快速的改进。虽然各种研究都报告了在预测反应性质方面的成功,但对预测不确定性的量化很少进行研究,从而影响了在实际应用中使用这些预测性质的信心。在这里,我们将图卷积神经网络(GCNN)与三种不确定性预测技术相结合,包括深度集成、蒙特卡罗(MC)-dropout 和证据学习,以深入了解不确定性量化和实用性。深度集成模型在准确性方面优于其他模型,并且在所有基本反应性质数据集上估计预测不确定性的可靠性最高。我们还验证了深度集成模型在识别认知和随机不确定性方面具有令人满意的能力。此外,我们采用了蒙特卡罗树搜索方法来提取可解释的反应子结构,为 DL 预测的性质及其相应的不确定性提供了化学解释。最后,为了展示不确定性在实际应用中的实用性,我们对 DL 构建的动力学模型进行了不确定性引导的校准,与没有不确定性引导的校准相比,在识别主要反应途径方面的命中率提高了 25%。