IEEE Trans Vis Comput Graph. 2018 Nov;24(11):2993-3004. doi: 10.1109/TVCG.2018.2868527. Epub 2018 Sep 10.
We propose a new approach for 3D reconstruction of dynamic indoor and outdoor scenes in everyday environments, leveraging only cameras worn by a user. This approach allows 3D reconstruction of experiences at any location and virtual tours from anywhere. The key innovation of the proposed ego-centric reconstruction system is to capture the wearer's body pose and facial expression from near-body views, e.g. cameras on the user's glasses, and to capture the surrounding environment using outward-facing views. The main challenge of the ego-centric reconstruction, however, is the poor coverage of the near-body views - that is, the user's body and face are observed from vantage points that are convenient for wear but inconvenient for capture. To overcome these challenges, we propose a parametric-model-based approach to user motion estimation. This approach utilizes convolutional neural networks (CNNs) for near-view body pose estimation, and we introduce a CNN-based approach for facial expression estimation that combines audio and video. For each time-point during capture, the intermediate model-based reconstructions from these systems are used to re-target a high-fidelity pre-scanned model of the user. We demonstrate that the proposed self-sufficient, head-worn capture system is capable of reconstructing the wearer's movements and their surrounding environment in both indoor and outdoor situations without any additional views. As a proof of concept, we show how the resulting 3D-plus-time reconstruction can be immersively experienced within a virtual reality system (e.g., the HTC Vive). We expect that the size of the proposed egocentric capture-and-reconstruction system will eventually be reduced to fit within future AR glasses, and will be widely useful for immersive 3D telepresence, virtual tours, and general use-anywhere 3D content creation.
我们提出了一种新的方法,用于在日常环境中仅利用用户佩戴的摄像机对动态室内和室外场景进行 3D 重建。这种方法允许在任何位置进行 3D 重建体验,并可从任何位置进行虚拟游览。所提出的以自我为中心的重建系统的关键创新之处在于从近体视图(例如用户眼镜上的摄像机)捕获佩戴者的身体姿势和面部表情,并使用向外视图捕获周围环境。然而,以自我为中心的重建的主要挑战是近体视图的覆盖范围较差-也就是说,用户的身体和面部是从便于佩戴但不便捕获的有利位置观察到的。为了克服这些挑战,我们提出了一种基于参数模型的用户运动估计方法。该方法利用卷积神经网络(CNN)进行近景身体姿势估计,并且我们引入了一种基于 CNN 的面部表情估计方法,该方法结合了音频和视频。在捕获的每个时间点,这些系统的中间基于模型的重建都用于重新定位用户的高保真预扫描模型。我们证明,所提出的自给自足,头戴式捕获系统能够在室内和室外环境中重建佩戴者的运动及其周围环境,而无需任何其他视图。作为概念验证,我们展示了如何在虚拟现实系统(例如 HTC Vive)中沉浸式体验由此产生的 3D 加时间重建。我们期望所提出的以自我为中心的捕获和重建系统的尺寸最终将缩小到适合未来的 AR 眼镜,并将广泛用于沉浸式 3D 远程呈现,虚拟游览和通用的任何位置 3D 内容创建。
IEEE Trans Vis Comput Graph. 2018-9-10
IEEE Trans Vis Comput Graph. 2005
IEEE Trans Vis Comput Graph. 2019-2-15
Sensors (Basel). 2018-9-11
Sensors (Basel). 2021-4-1
IEEE Trans Vis Comput Graph. 2018-4
PLoS One. 2016-12-8
IEEE Trans Syst Man Cybern B Cybern. 2009-8
IEEE Trans Vis Comput Graph. 2018-4