Department of Mathematics and Computer Science, Weizmann Institute of Science, Rehovot 76100, Israel.
Proc Natl Acad Sci U S A. 2012 Oct 30;109(44):18215-20. doi: 10.1073/pnas.1207690109. Epub 2012 Sep 24.
Early in development, infants learn to solve visual problems that are highly challenging for current computational methods. We present a model that deals with two fundamental problems in which the gap between computational difficulty and infant learning is particularly striking: learning to recognize hands and learning to recognize gaze direction. The model is shown a stream of natural videos and learns without any supervision to detect human hands by appearance and by context, as well as direction of gaze, in complex natural scenes. The algorithm is guided by an empirically motivated innate mechanism--the detection of "mover" events in dynamic images, which are the events of a moving image region causing a stationary region to move or change after contact. Mover events provide an internal teaching signal, which is shown to be more effective than alternative cues and sufficient for the efficient acquisition of hand and gaze representations. The implications go beyond the specific tasks, by showing how domain-specific "proto concepts" can guide the system to acquire meaningful concepts, which are significant to the observer but statistically inconspicuous in the sensory input.
在早期发展中,婴儿学会解决当前计算方法极具挑战性的视觉问题。我们提出了一个模型,该模型处理两个基本问题,在这些问题中,计算难度和婴儿学习之间的差距尤为明显:学习识别手和学习识别注视方向。该模型接收自然视频流,并在没有任何监督的情况下学习通过外观和上下文检测复杂自然场景中的人手,以及注视方向。该算法由一个经验驱动的内在机制指导——动态图像中“移动者”事件的检测,即图像区域移动导致静止区域在接触后移动或改变的事件。移动者事件提供了内部教学信号,事实证明它比替代线索更有效,并且足以有效地获取手和注视表示。其影响超出了特定任务,表明特定于域的“原型概念”如何引导系统获取对观察者有意义但在感官输入中统计上不明显的有意义概念。