Paiva Pedro Yuri Arbs, Moreno Camila Castro, Smith-Miles Kate, Valeriano Maria Gabriela, Lorena Ana Carolina
Instituto Tecnológico de Aeronáutica (ITA), São José dos Campos, São Paulo Brazil.
Universidade Federal de São Paulo (Unifesp), São José dos Campos, São Paulo Brazil.
Mach Learn. 2022;111(8):3085-3123. doi: 10.1007/s10994-022-06205-9. Epub 2022 Jun 22.
Machine Learning studies often involve a series of computational experiments in which the predictive performance of multiple models are compared across one or more datasets. The results obtained are usually summarized through average statistics, either in numeric tables or simple plots. Such approaches fail to reveal interesting subtleties about algorithmic performance, including which observations an algorithm may find easy or hard to classify, and also which observations within a dataset may present unique challenges. Recently, a methodology known as Instance Space Analysis was proposed for visualizing algorithm performance across different datasets. This methodology relates predictive performance to estimated instance hardness measures extracted from the datasets. However, the analysis considered an instance as being an entire classification dataset and the algorithm performance was reported for each dataset as an average error across all observations in the dataset. In this paper, we developed a more fine-grained analysis by adapting the ISA methodology. The adapted version of ISA allows the analysis of an individual classification dataset by a 2-D hardness embedding, which provides a visualization of the data according to the difficulty level of its individual observations. This allows deeper analyses of the relationships between instance hardness and predictive performance of classifiers. We also provide an open-access Python package named PyHard, which encapsulates the adapted ISA and provides an interactive visualization interface. We illustrate through case studies how our tool can provide insights about data quality and algorithm performance in the presence of challenges such as noisy and biased data.
机器学习研究通常涉及一系列计算实验,在这些实验中,要在一个或多个数据集上比较多个模型的预测性能。所获得的结果通常通过平均统计量进行总结,以数字表格或简单图表的形式呈现。这些方法无法揭示有关算法性能的有趣细微差别,包括算法可能认为容易或难以分类的观察结果,以及数据集中哪些观察结果可能带来独特的挑战。最近,一种称为实例空间分析的方法被提出来用于可视化不同数据集上的算法性能。这种方法将预测性能与从数据集中提取的估计实例难度度量联系起来。然而,该分析将一个实例视为整个分类数据集,并且将每个数据集的算法性能报告为数据集中所有观察结果的平均误差。在本文中,我们通过改编实例空间分析(ISA)方法开发了一种更细粒度的分析方法。改编后的ISA版本允许通过二维难度嵌入对单个分类数据集进行分析,它根据单个观察结果的难度级别对数据进行可视化。这使得能够更深入地分析实例难度与分类器预测性能之间的关系。我们还提供了一个名为PyHard的开放获取Python包,它封装了改编后的ISA并提供了一个交互式可视化界面。我们通过案例研究说明我们的工具如何在存在噪声和有偏差数据等挑战的情况下提供有关数据质量和算法性能的见解。