Li Shuai, He Shufang, Dong Yuanrui, Dai Caihong, Liu Jinyuan, Wang Yanfei, Shigemasu Hiroaki
Division of Optical Metrology, National Institute of Metrology, Beijing 100029, China.
Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology, Beijing 102627, China.
Sensors (Basel). 2025 May 17;25(10):3171. doi: 10.3390/s25103171.
Depth perception of the human visual system in three-dimensional (3D) space plays an important role in human-computer interaction and artificial intelligence (AI) areas. It mainly employs binocular disparity and motion parallax cues. This study aims to systemically summarize the related studies about depth perception specified by these two cues.
We conducted a literature investigation on related studies and summarized them from aspects like motivations, research trends, mechanisms, and interaction models of depth perception specified by these two cues.
Development trends show that depth perception research has gradually evolved from early studies based on a single cue to quantitative studies based on the interaction between these two cues. Mechanisms of these two cues reveal that depth perception specified by the binocular disparity cue is mainly influenced by factors like spatial variation in disparity, viewing distance, the position of visual field (or retinal image) used, and interaction with other cues; whereas that specified by the motion parallax cue is affected by head movement and retinal image motion, interaction with other cues, and the observer's age. By integrating these two cues, several types of models for depth perception are summarized: the weak fusion (WF) model, the modified weak fusion (MWF) model, the strong fusion (SF) model, and the intrinsic constraint (IC) model. The merits and limitations of each model are analyzed and compared.
Based on this review, a clear picture of the study on depth perception specified by binocular disparity and motion parallax cues can be seen. Open research challenges and future directions are presented. In the future, it is necessary to explore methods for easier manipulating of depth cue signals in stereoscopic images and adopting deep learning-related methods to construct models and predict depths, to meet the increasing demand of human-computer interaction in complex 3D scenarios.
人类视觉系统在三维(3D)空间中的深度感知在人机交互和人工智能(AI)领域发挥着重要作用。它主要利用双眼视差和运动视差线索。本研究旨在系统总结关于由这两种线索所确定的深度感知的相关研究。
我们对相关研究进行了文献调查,并从这两种线索所确定的深度感知的动机、研究趋势、机制和交互模型等方面对其进行了总结。
发展趋势表明,深度感知研究已逐渐从基于单一线索的早期研究演变为基于这两种线索之间相互作用的定量研究。这两种线索的机制表明,由双眼视差线索所确定的深度感知主要受视差的空间变化、观察距离、所使用的视野(或视网膜图像)位置以及与其他线索的相互作用等因素影响;而由运动视差线索所确定的深度感知则受头部运动和视网膜图像运动、与其他线索的相互作用以及观察者年龄的影响。通过整合这两种线索,总结出了几种深度感知模型:弱融合(WF)模型、改进的弱融合(MWF)模型、强融合(SF)模型和内在约束(IC)模型。分析并比较了每个模型的优缺点。
基于本综述,可以清晰地看到关于由双眼视差和运动视差线索所确定的深度感知的研究情况。提出了开放的研究挑战和未来方向。未来,有必要探索在立体图像中更轻松地操纵深度线索信号的方法,并采用与深度学习相关的方法来构建模型和预测深度,以满足复杂3D场景中日益增长的人机交互需求。