Angelini Marco, Blasilli Graziano, Lenti Simone, Santucci Giuseppe
IEEE Trans Vis Comput Graph. 2024 Aug;30(8):4497-4513. doi: 10.1109/TVCG.2023.3263739. Epub 2024 Jul 1.
Machine learning techniques are a driving force for research in various fields, from credit card fraud detection to stock analysis. Recently, a growing interest in increasing human involvement has emerged, with the primary goal of improving the interpretability of machine learning models. Among different techniques, Partial Dependence Plots (PDP) represent one of the main model-agnostic approaches for interpreting how the features influence the prediction of a machine learning model. However, its limitations (i.e., visual interpretation, aggregation of heterogeneous effects, inaccuracy, and computability) could complicate or misdirect the analysis. Moreover, the resulting combinatorial space can be challenging to explore both computationally and cognitively when analyzing the effects of more features at the same time. This article proposes a conceptual framework that enables effective analysis workflows, mitigating state-of-the-art limitations. The proposed framework allows for exploring and refining computed partial dependences, observing incrementally accurate results, and steering the computation of new partial dependences on user-selected subspaces of the combinatorial and intractable space. With this approach, the user can save both computational and cognitive costs, in contrast with the standard monolithic approach that computes all the possible combinations of features on all their domains in batch. The framework is the result of a careful design process involving experts' knowledge during its validation and informed the development of a prototype, W4SP, that demonstrates its applicability traversing its different paths. A case study shows the advantages of the proposed approach.
机器学习技术是推动从信用卡欺诈检测到股票分析等各个领域研究的一股力量。最近,人们对增加人为参与的兴趣日益浓厚,其主要目标是提高机器学习模型的可解释性。在不同的技术中,局部依赖图(PDP)是解释特征如何影响机器学习模型预测的主要模型无关方法之一。然而,它的局限性(即视觉解释、异质效应的聚合、不准确性和可计算性)可能会使分析复杂化或产生误导。此外,在同时分析更多特征的影响时,由此产生的组合空间在计算和认知方面都具有挑战性。本文提出了一个概念框架,以实现有效的分析工作流程,减轻现有技术的局限性。所提出的框架允许探索和完善计算出的局部依赖关系,观察逐步精确的结果,并在组合且难以处理的空间中用户选择的子空间上指导新的局部依赖关系的计算。通过这种方法,与在批处理中计算所有特征在其所有域上的所有可能组合的标准整体方法相比,用户可以节省计算和认知成本。该框架是精心设计过程的结果,在验证过程中融入了专家知识,并为一个原型W4SP的开发提供了参考,该原型通过其不同路径展示了其适用性。一个案例研究展示了所提出方法的优势。