Yang Chu-I, Li Yi-Pei
Department of Chemical Engineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, 10617, Taiwan.
Taiwan International Graduate Program (TIGP), Academia Sinica, No. 128, Sec. 2, Academia Road, Taipei, 11529, Taiwan.
J Cheminform. 2023 Feb 3;15(1):13. doi: 10.1186/s13321-023-00682-3.
Quantifying uncertainty in machine learning is important in new research areas with scarce high-quality data. In this work, we develop an explainable uncertainty quantification method for deep learning-based molecular property prediction. This method can capture aleatoric and epistemic uncertainties separately and attribute the uncertainties to atoms present in the molecule. The atom-based uncertainty method provides an extra layer of chemical insight to the estimated uncertainties, i.e., one can analyze individual atomic uncertainty values to diagnose the chemical component that introduces uncertainty to the prediction. Our experiments suggest that atomic uncertainty can detect unseen chemical structures and identify chemical species whose data are potentially associated with significant noise. Furthermore, we propose a post-hoc calibration method to refine the uncertainty quantified by ensemble models for better confidence interval estimates. This work improves uncertainty calibration and provides a framework for assessing whether and why a prediction should be considered unreliable.
在高质量数据稀缺的新研究领域,量化机器学习中的不确定性很重要。在这项工作中,我们开发了一种用于基于深度学习的分子性质预测的可解释不确定性量化方法。该方法可以分别捕获偶然不确定性和认知不确定性,并将不确定性归因于分子中存在的原子。基于原子的不确定性方法为估计的不确定性提供了额外的化学见解,即人们可以分析单个原子的不确定性值,以诊断给预测带来不确定性的化学成分。我们的实验表明,原子不确定性可以检测到未见的化学结构,并识别其数据可能与显著噪声相关的化学物种。此外,我们提出了一种事后校准方法,以改进由集成模型量化的不确定性,从而获得更好的置信区间估计。这项工作改进了不确定性校准,并提供了一个框架,用于评估预测是否以及为何应被视为不可靠。