Jin Zhihua, Wang Xingbo, Cheng Furui, Sun Chunhui, Liu Qun, Qu Huamin
IEEE Trans Vis Comput Graph. 2024 Jul;30(7):3594-3608. doi: 10.1109/TVCG.2023.3236380. Epub 2024 Jun 27.
Benchmark datasets play an important role in evaluating Natural Language Understanding (NLU) models. However, shortcuts-unwanted biases in the benchmark datasets-can damage the effectiveness of benchmark datasets in revealing models' real capabilities. Since shortcuts vary in coverage, productivity, and semantic meaning, it is challenging for NLU experts to systematically understand and avoid them when creating benchmark datasets. In this paper, we develop a visual analytics system, ShortcutLens, to help NLU experts explore shortcuts in NLU benchmark datasets. The system allows users to conduct multi-level exploration of shortcuts. Specifically, Statistics View helps users grasp the statistics such as coverage and productivity of shortcuts in the benchmark dataset. Template View employs hierarchical and interpretable templates to summarize different types of shortcuts. Instance View allows users to check the corresponding instances covered by the shortcuts. We conduct case studies and expert interviews to evaluate the effectiveness and usability of the system. The results demonstrate that ShortcutLens supports users in gaining a better understanding of benchmark dataset issues through shortcuts, inspiring them to create challenging and pertinent benchmark datasets.
基准数据集在评估自然语言理解(NLU)模型中起着重要作用。然而,基准数据集中的捷径——即不必要的偏差——会损害基准数据集在揭示模型真实能力方面的有效性。由于捷径在覆盖范围、生产率和语义含义方面各不相同,NLU专家在创建基准数据集时系统地理解并避免它们具有挑战性。在本文中,我们开发了一个可视化分析系统ShortcutLens,以帮助NLU专家探索NLU基准数据集中的捷径。该系统允许用户对捷径进行多层次探索。具体来说,统计视图帮助用户掌握基准数据集中捷径的覆盖范围和生产率等统计信息。模板视图采用分层且可解释的模板来总结不同类型的捷径。实例视图允许用户检查捷径所涵盖的相应实例。我们进行了案例研究和专家访谈,以评估该系统的有效性和可用性。结果表明,ShortcutLens支持用户通过捷径更好地理解基准数据集问题,激发他们创建具有挑战性和相关性的基准数据集。