Department of Chemistry, Lomonosov Moscow State University, Moscow, 119991, Russia.
Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, 119071, Russia.
Sci Rep. 2022 Sep 2;12(1):14931. doi: 10.1038/s41598-022-19205-5.
Immense effort has been exerted in the materials informatics community towards enhancing the accuracy of machine learning (ML) models; however, the uncertainty quantification (UQ) of state-of-the-art algorithms also demands further development. Most prominent UQ methods are model-specific or are related to the ensembles of models; therefore, there is a need to develop a universal technique that can be readily applied to a single model from a diverse set of ML algorithms. In this study, we suggest a new UQ measure known as the Δ-metric to address this issue. The presented quantitative criterion was inspired by the k-nearest neighbor approach adopted for applicability domain estimation in chemoinformatics. It surpasses several UQ methods in accurately ranking the predictive errors and could be considered a low-cost option for a more advanced deep ensemble strategy. We also evaluated the performance of the presented UQ measure on various classes of materials, ML algorithms, and types of input features, thus demonstrating its universality.
在材料信息学领域,人们付出了巨大的努力来提高机器学习 (ML) 模型的准确性;然而,最先进算法的不确定性量化 (UQ) 也需要进一步发展。大多数突出的 UQ 方法是特定于模型的,或者与模型的集合有关;因此,需要开发一种通用技术,可以很容易地应用于来自不同 ML 算法的单个模型。在这项研究中,我们提出了一种新的 UQ 度量,称为 Δ-度量,以解决这个问题。所提出的定量标准受到了化学信息学中采用的 k-最近邻方法的启发,用于适用性域估计。它在准确地对预测误差进行排序方面优于几种 UQ 方法,并且可以被认为是一种更先进的深度集成策略的低成本选择。我们还评估了所提出的 UQ 度量在各种材料、ML 算法和输入特征类型上的性能,从而证明了它的通用性。