Sizintsev Mikhail, Mithun Niluthpol Chowdhury, Chiu Han-Pang, Samarasekera Supun, Kumar Rakesh
IEEE Trans Vis Comput Graph. 2021 Nov;27(11):4236-4244. doi: 10.1109/TVCG.2021.3106434. Epub 2021 Oct 27.
Proper occlusion based rendering is very important to achieve realism in all indoor and outdoor Augmented Reality (AR) applications. This paper addresses the problem of fast and accurate dynamic occlusion reasoning by real objects in the scene for large scale outdoor AR applications. Conceptually, proper occlusion reasoning requires an estimate of depth for every point in augmented scene which is technically hard to achieve for outdoor scenarios, especially in the presence of moving objects. We propose a method to detect and automatically infer the depth for real objects in the scene without explicit detailed scene modeling and depth sensing (e.g. without using sensors such as 3D-LiDAR). Specifically, we employ instance segmentation of color image data to detect real dynamic objects in the scene and use either a top-down terrain elevation model or deep learning based monocular depth estimation model to infer their metric distance from the camera for proper occlusion reasoning in real time. The realized solution is implemented in a low latency real-time framework for video-see-though AR and is directly extendable to optical-see-through AR. We minimize latency in depth reasoning and occlusion rendering by doing semantic object tracking and prediction in video frames.
基于正确遮挡的渲染对于在所有室内和室外增强现实(AR)应用中实现真实感非常重要。本文针对大规模室外AR应用中场景中真实物体的快速准确动态遮挡推理问题。从概念上讲,正确的遮挡推理需要估计增强场景中每个点的深度,这在技术上对于室外场景来说很难实现,尤其是在存在移动物体的情况下。我们提出了一种方法,无需显式详细的场景建模和深度感知(例如不使用3D激光雷达等传感器)即可检测并自动推断场景中真实物体的深度。具体而言,我们利用彩色图像数据的实例分割来检测场景中的真实动态物体,并使用自上而下的地形高程模型或基于深度学习的单目深度估计模型来推断它们与相机的度量距离,以便实时进行正确的遮挡推理。实现的解决方案在用于视频透视AR的低延迟实时框架中实现,并且可以直接扩展到光学透视AR。我们通过在视频帧中进行语义对象跟踪和预测来最小化深度推理和遮挡渲染中的延迟。