IEEE Trans Vis Comput Graph. 2020 Jan;26(1):1012-1021. doi: 10.1109/TVCG.2019.2934786. Epub 2019 Aug 20.
Perceptual tasks in visualizations often involve comparisons. Of two sets of values depicted in two charts, which set had values that were the highest overall? Which had the widest range? Prior empirical work found that the performance on different visual comparison tasks (e.g., "biggest delta", "biggest correlation") varied widely across different combinations of marks and spatial arrangements. In this paper, we expand upon these combinations in an empirical evaluation of two new comparison tasks: the "biggest mean" and "biggest range" between two sets of values. We used a staircase procedure to titrate the difficulty of the data comparison to assess which arrangements produced the most precise comparisons for each task. We find visual comparisons of biggest mean and biggest range are supported by some chart arrangements more than others, and that this pattern is substantially different from the pattern for other tasks. To synthesize these dissonant findings, we argue that we must understand which features of a visualization are actually used by the human visual system to solve a given task. We call these perceptual proxies. For example, when comparing the means of two bar charts, the visual system might use a "Mean length" proxy that isolates the actual lengths of the bars and then constructs a true average across these lengths. Alternatively, it might use a "Hull Area" proxy that perceives an implied hull bounded by the bars of each chart and then compares the areas of these hulls. We propose a series of potential proxies across different tasks, marks, and spatial arrangements. Simple models of these proxies can be empirically evaluated for their explanatory power by matching their performance to human performance across these marks, arrangements, and tasks. We use this process to highlight candidates for perceptual proxies that might scale more broadly to explain performance in visual comparison.
可视化中的感知任务通常涉及比较。在两个图表中描述的两组值中,哪一组的总体值最高?哪一组的范围最广?先前的实证研究发现,在不同的视觉比较任务(例如,“最大差值”、“最大相关性”)上的表现因标记和空间排列的不同组合而有很大差异。在本文中,我们通过对两个新的比较任务(两个数据集之间的“最大平均值”和“最大范围”)的实证评估来扩展这些组合。我们使用阶梯程序来调整数据比较的难度,以评估哪些排列方式为每个任务产生了最精确的比较。我们发现,对于最大平均值和最大范围的视觉比较,某些图表排列方式比其他排列方式更支持,而且这种模式与其他任务的模式有很大不同。为了综合这些不一致的发现,我们认为我们必须了解可视化中哪些特征实际上被人类视觉系统用于解决给定的任务。我们称之为感知代理。例如,在比较两个条形图的平均值时,视觉系统可能使用“均值长度”代理来隔离条形的实际长度,然后在这些长度上构建一个真实的平均值。或者,它可能使用“外壳面积”代理来感知由每个图表的条形边界的隐含外壳,然后比较这些外壳的面积。我们提出了一系列不同任务、标记和空间排列的潜在代理。通过将这些代理的性能与这些标记、排列和任务中的人类性能进行匹配,可以对这些代理的简单模型进行实证评估,以衡量其解释能力。我们使用这个过程来突出可能更广泛地解释视觉比较性能的感知代理的候选者。