Centre for Vision and Cognition, Psychology, University of Southampton, Southampton, UK.
Centre for Vision Research, Department of Psychology, Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, Canada.
J Vis. 2021 Feb 3;21(2):8. doi: 10.1167/jov.21.2.8.
Categorization performance is a popular metric of scene recognition and understanding in behavioral and computational research. However, categorical constructs and their labels can be somewhat arbitrary. Derived from exhaustive vocabularies of place names (e.g., Deng et al., 2009), or the judgements of small groups of researchers (e.g., Fei-Fei, Iyer, Koch, & Perona, 2007), these categories may not correspond with human-preferred taxonomies. Here, we propose clustering by increasing the rand index via coordinate ascent (CIRCA): an unsupervised, data-driven clustering method for deriving ground-truth scene categories. In Experiment 1, human participants organized 80 stereoscopic images of outdoor scenes from the Southampton-York Natural Scenes (SYNS) dataset (Adams et al., 2016) into discrete categories. In separate tasks, images were grouped according to i) semantic content, ii) three-dimensional spatial structure, or iii) two-dimensional image appearance. Participants provided text labels for each group. Using the CIRCA method, we determined the most representative category structure and then derived category labels for each task/dimension. In Experiment 2, we found that these categories generalized well to a larger set of SYNS images, and new observers. In Experiment 3, we tested the relationship between our category systems and the spatial envelope model (Oliva & Torralba, 2001). Finally, in Experiment 4, we validated CIRCA on a larger, independent dataset of same-different category judgements. The derived category systems outperformed the SUN taxonomy (Xiao, Hays, Ehinger, Oliva, & Torralba, 2010) and an alternative clustering method (Greene, 2019). In summary, we believe this novel categorization method can be applied to a wide range of datasets to derive optimal categorical groupings and labels from psychophysical judgements of stimulus similarity.
分类性能是行为和计算研究中场景识别和理解的常用指标。然而,类别结构及其标签可能有些随意。这些类别源自地名的详尽词汇表(例如,Deng 等人,2009 年),或者小组成员的判断(例如,Fei-Fei、Iyer、Koch 和 Perona,2007 年),它们可能与人类偏好的分类法不对应。在这里,我们通过坐标上升增加 rand 索引来提出聚类(CIRCA):一种用于得出地面真实场景类别的无监督、数据驱动的聚类方法。在实验 1 中,人类参与者将来自南安普敦-约克自然场景(SYNS)数据集(Adams 等人,2016 年)的 80 个立体户外场景图像组织成离散类别。在单独的任务中,图像根据 i)语义内容,ii)三维空间结构或 iii)二维图像外观进行分组。参与者为每个组提供文本标签。使用 CIRCA 方法,我们确定了最具代表性的类别结构,然后为每个任务/维度得出类别标签。在实验 2 中,我们发现这些类别很好地概括了更大的 SYNS 图像集和新的观察者。在实验 3 中,我们测试了我们的类别系统与空间包络模型(Oliva 和 Torralba,2001 年)之间的关系。最后,在实验 4 中,我们在一个更大的、独立的相同-不同类别判断数据集上验证了 CIRCA。得出的类别系统优于 SUN 分类法(Xiao、Hays、Ehinger、Oliva 和 Torralba,2010 年)和替代聚类方法(Greene,2019 年)。总之,我们相信这种新的分类方法可以应用于广泛的数据集,从刺激相似性的心理物理判断中得出最佳的类别分组和标签。