IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):228-241. doi: 10.1109/TPAMI.2020.3008107. Epub 2021 Dec 7.
Achieving human-like visual abilities is a holy grail for machine vision, yet precisely how insights from human vision can improve machines has remained unclear. Here, we demonstrate two key conceptual advances: First, we show that most machine vision models are systematically different from human object perception. To do so, we collected a large dataset of perceptual distances between isolated objects in humans and asked whether these perceptual data can be predicted by many common machine vision algorithms. We found that while the best algorithms explain ∼ 70 percent of the variance in the perceptual data, all the algorithms we tested make systematic errors on several types of objects. In particular, machine algorithms underestimated distances between symmetric objects compared to human perception. Second, we show that fixing these systematic biases can lead to substantial gains in classification performance. In particular, augmenting a state-of-the-art convolutional neural network with planar/reflection symmetry scores along multiple axes produced significant improvements in classification accuracy (1-10 percent) across categories. These results show that machine vision can be improved by discovering and fixing systematic differences from human vision.
实现类人视觉能力是机器视觉的圣杯,然而,人类视觉如何能够改善机器仍然不清楚。在这里,我们展示了两个关键的概念进展:首先,我们表明,大多数机器视觉模型与人类的物体感知系统存在明显差异。为此,我们收集了大量关于人类对孤立物体的感知距离的数据集,并询问这些感知数据是否可以通过许多常见的机器视觉算法来预测。我们发现,虽然最好的算法可以解释感知数据中约 70%的方差,但我们测试的所有算法在几种类型的物体上都存在系统误差。特别是,机器算法对对称物体之间的距离的估计低于人类感知。其次,我们表明,纠正这些系统偏差可以显著提高分类性能。具体来说,在一个最先进的卷积神经网络中加入沿多个轴的平面/反射对称分数,在多个类别中显著提高了分类准确性(1-10%)。这些结果表明,通过发现和纠正与人类视觉的系统差异,可以改进机器视觉。