OpticArray Technologies, Rockville, MD 20850.
Department of Molecular & Cell Biology, Helen Wills Neuroscience Institute, Berkeley, CA 94704.
Proc Natl Acad Sci U S A. 2022 Oct 11;119(41):e2204248119. doi: 10.1073/pnas.2204248119. Epub 2022 Oct 6.
The world is composed of objects, the ground, and the sky. Visual perception of objects requires solving two fundamental challenges: 1) segmenting visual input into discrete units and 2) tracking identities of these units despite appearance changes due to object deformation, changing perspective, and dynamic occlusion. Current computer vision approaches to segmentation and tracking that approach human performance all require learning, raising the question, Can objects be segmented and tracked without learning? Here, we show that the mathematical structure of light rays reflected from environment surfaces yields a natural representation of persistent surfaces, and this surface representation provides a solution to both the segmentation and tracking problems. We describe how to generate this surface representation from continuous visual input and demonstrate that our approach can segment and invariantly track objects in cluttered synthetic video despite severe appearance changes, without requiring learning.
世界由物体、地面和天空组成。对物体的视觉感知需要解决两个基本挑战:1)将视觉输入分割成离散的单元;2)即使由于物体变形、视角变化和动态遮挡导致外观发生变化,也要跟踪这些单元的身份。目前,接近人类表现的计算机视觉分割和跟踪方法都需要学习,这就提出了一个问题:是否可以不通过学习来分割和跟踪物体?在这里,我们表明,从环境表面反射的光线的数学结构产生了持久表面的自然表示,并且该表面表示为分割和跟踪问题提供了一个解决方案。我们描述了如何从连续的视觉输入中生成这种表面表示,并证明了我们的方法可以在杂乱的合成视频中分割和不变地跟踪对象,即使外观发生严重变化,也无需学习。