Biophysics Program and Institute for Physical Science and Technology, University of Maryland, College Park, 20742, USA.
Department of Chemistry and Biochemistry and Institute for Physical Science and Technology, University of Maryland, College Park, 20742, USA.
Nat Commun. 2024 Sep 9;15(1):7859. doi: 10.1038/s41467-024-51970-x.
In recent years, predictive machine learning models have gained prominence across various scientific domains. However, their black-box nature necessitates establishing trust in them before accepting their predictions as accurate. One promising strategy involves employing explanation techniques that elucidate the rationale behind a model's predictions in a way that humans can understand. However, assessing the degree of human interpretability of these explanations is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for evaluating the human interpretability of any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms, a method for generating optimally human-interpretable explanations in a model-agnostic manner. We demonstrate the wide-ranging applicability of this method by explaining predictions from various black-box model architectures across diverse domains, including molecular simulations, text, and image classification.
近年来,预测机器学习模型在各个科学领域得到了广泛关注。然而,由于其黑盒性质,在接受其预测为准确之前,必须建立对其的信任。一种有前途的策略是采用解释技术,以人类可以理解的方式阐明模型预测背后的原理。然而,评估这些解释的人类可解释性程度是一个具有挑战性的问题。在这项工作中,我们引入了解释熵作为评估任何线性模型的人类可解释性的通用解决方案。使用这个概念并从经典热力学中汲取灵感,我们提出了人工智能和其他黑盒范例的热力学启发式可解释表示,这是一种以模型无关的方式生成最佳人类可解释解释的方法。我们通过解释来自各个黑盒模型架构的预测,展示了该方法的广泛适用性,包括分子模拟、文本和图像分类等不同领域。