Su Che-Chun, Cormack Lawrence K, Bovik Alan C
Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, USAhttps://www.linkedin.com/in/che-chun-su/
Department of Psychology, The University of Texas at Austin, Austin, TX, USAhttps://liberalarts.utexas.edu/psychology/faculty/
J Vis. 2017 May 1;17(5):22. doi: 10.1167/17.5.22.
Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world that the vision system likely exploits to compute perceived depth, monocularly as well as binocularly. Toward understanding how this might be accomplished, we propose a Bayesian model of monocular depth computation that recovers detailed 3D scene structures by extracting reliable, robust, depth-sensitive statistical features from single natural images. These features are derived using well-accepted univariate natural scene statistics (NSS) models and recent bivariate/correlation NSS models that describe the relationships between 2D photographic images and their associated depth maps. This is accomplished by building a dictionary of canonical local depth patterns from which NSS features are extracted as prior information. The dictionary is used to create a multivariate Gaussian mixture (MGM) likelihood model that associates local image features with depth patterns. A simple Bayesian predictor is then used to form spatial depth estimates. The depth results produced by the model, despite its simplicity, correlate well with ground-truth depths measured by a current-generation terrestrial light detection and ranging (LIDAR) scanner. Such a strong form of statistical depth information could be used by the visual system when creating overall estimated depth maps incorporating stereopsis, accommodation, and other conditions. Indeed, even in isolation, the Bayesian predictor delivers depth estimates that are competitive with state-of-the-art "computer vision" methods that utilize highly engineered image features and sophisticated machine learning algorithms.
从单张单目摄影图像估计准确且自然逼真的密集深度图是一个难题。然而,人类观察者理解照片所隐含的深度结构却几乎没有困难。现实世界环境的二维(2D)图像包含有关世界三维(3D)结构的重要统计信息,视觉系统可能会利用这些信息来单目和双目计算感知深度。为了理解这是如何实现的,我们提出了一种单目深度计算的贝叶斯模型,该模型通过从单张自然图像中提取可靠、稳健且对深度敏感的统计特征来恢复详细的3D场景结构。这些特征是使用广为接受的单变量自然场景统计(NSS)模型以及描述2D摄影图像与其相关深度图之间关系的最新双变量/相关NSS模型推导出来的。这是通过构建一个规范局部深度模式字典来实现的,从中提取NSS特征作为先验信息。该字典用于创建一个多变量高斯混合(MGM)似然模型,将局部图像特征与深度模式相关联。然后使用一个简单的贝叶斯预测器来形成空间深度估计。尽管该模型很简单,但其产生的深度结果与当前一代地面光探测与测距(LIDAR)扫描仪测量的真实深度相关性良好。当创建包含立体视觉、调节和其他条件的整体估计深度图时,视觉系统可以使用这种强大形式的统计深度信息。实际上,即使单独使用,贝叶斯预测器提供的深度估计也能与利用高度工程化图像特征和复杂机器学习算法的最新“计算机视觉”方法相竞争。