一种用于动态环境的基于单目视频帧的新型无传感器3D模型重建方法。

A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment.

作者信息

Fathy Ghada M, Hassan Hanan A, Sheta Walaa, Omara Fatma A, Nabil Emad

机构信息

Informatics Research Institute, City for Scientific Research and Technological Applications, SRTA-City, Alexandria, Egypt.

Department of Computer Science, Faculty of Computers and Artificial Intelligence, Cairo University, Giza, Egypt.

出版信息

PeerJ Comput Sci. 2021 May 12;7:e529. doi: 10.7717/peerj-cs.529. eCollection 2021.

DOI:10.7717/peerj-cs.529

PMID:34084931

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8157153/

Abstract

Occlusion awareness is one of the most challenging problems in several fields such as multimedia, remote sensing, computer vision, and computer graphics. Realistic interaction applications are suffering from dealing with occlusion and collision problems in a dynamic environment. Creating dense 3D reconstruction methods is the best solution to solve this issue. However, these methods have poor performance in practical applications due to the absence of accurate depth, camera pose, and object motion.This paper proposes a new framework that builds a full 3D model reconstruction that overcomes the occlusion problem in a complex dynamic scene without using sensors' data. Popular devices such as a monocular camera are used to generate a suitable model for video streaming applications. The main objective is to create a smooth and accurate 3D point-cloud for a dynamic environment using cumulative information of a sequence of RGB video frames. The framework is composed of two main phases. The first uses an unsupervised learning technique to predict scene depth, camera pose, and objects' motion from RGB monocular videos. The second generates a frame-wise point cloud fusion to reconstruct a 3D model based on a video frame sequence. Several evaluation metrics are measured: Localization error, RMSE, and fitness between ground truth (KITTI's sparse LiDAR points) and predicted point-cloud. Moreover, we compared the framework with different widely used state-of-the-art evaluation methods such as MRE and Chamfer Distance. Experimental results showed that the proposed framework surpassed the other methods and proved to be a powerful candidate in 3D model reconstruction.

摘要

遮挡感知是多媒体、遥感、计算机视觉和计算机图形学等多个领域中最具挑战性的问题之一。现实交互应用在动态环境中处理遮挡和碰撞问题时面临困境。创建密集三维重建方法是解决此问题的最佳方案。然而，由于缺乏精确的深度、相机姿态和物体运动信息，这些方法在实际应用中性能不佳。本文提出了一种新框架，该框架可构建完整的三维模型重建，无需使用传感器数据即可克服复杂动态场景中的遮挡问题。使用单目相机等常见设备为视频流应用生成合适的模型。主要目标是利用RGB视频帧序列的累积信息为动态环境创建平滑且精确的三维点云。该框架由两个主要阶段组成。第一阶段使用无监督学习技术从RGB单目视频中预测场景深度、相机姿态和物体运动。第二阶段基于视频帧序列生成逐帧点云融合以重建三维模型。测量了多个评估指标：定位误差、均方根误差以及与地面真值（KITTI的稀疏激光雷达点）和预测点云之间的拟合度。此外，我们将该框架与不同的广泛使用的先进评估方法（如平均相对误差和倒角距离）进行了比较。实验结果表明，所提出的框架优于其他方法，在三维模型重建方面被证明是一个有力的候选方案。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/158c/8157153/6ecf0ecc3502/peerj-cs-07-529-g001.jpg

相似文献

A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment.

PeerJ Comput Sci. 2021 May 12;7:e529. doi: 10.7717/peerj-cs.529. eCollection 2021.

SLAM-based dense surface reconstruction in monocular Minimally Invasive Surgery and its application to Augmented Reality.

Comput Methods Programs Biomed. 2018 May;158:135-146. doi: 10.1016/j.cmpb.2018.02.006. Epub 2018 Feb 8.

3D reconstruction of objects with occlusion and surface reflection using a dual monocular structured light system.

Appl Opt. 2020 Oct 10;59(29):9259-9271. doi: 10.1364/AO.402146.

Recovering dense 3D point clouds from single endoscopic image.

Comput Methods Programs Biomed. 2021 Jun;205:106077. doi: 10.1016/j.cmpb.2021.106077. Epub 2021 Apr 3.

Long-Range Augmented Reality with Dynamic Occlusion Rendering.

IEEE Trans Vis Comput Graph. 2021 Nov;27(11):4236-4244. doi: 10.1109/TVCG.2021.3106434. Epub 2021 Oct 27.

EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos.

Med Image Anal. 2021 Jul;71:102058. doi: 10.1016/j.media.2021.102058. Epub 2021 Apr 15.

Dynamic detection of three-dimensional crop phenotypes based on a consumer-grade RGB-D camera.

Front Plant Sci. 2023 Jan 27;14:1097725. doi: 10.3389/fpls.2023.1097725. eCollection 2023.

Real-Time 3D Reconstruction Method Based on Monocular Vision.

Sensors (Basel). 2021 Sep 2;21(17):5909. doi: 10.3390/s21175909.

PLIN: A Network for Pseudo-LiDAR Point Cloud Interpolation.

Sensors (Basel). 2020 Mar 12;20(6):1573. doi: 10.3390/s20061573.

SLAM-OR: Simultaneous Localization, Mapping and Object Recognition Using Video Sensors Data in Open Environments from the Sparse Points Cloud.

Sensors (Basel). 2021 Jul 11;21(14):4734. doi: 10.3390/s21144734.

引用本文的文献

A robust cooperative localization algorithm based on covariance intersection method for multi-robot systems.

PeerJ Comput Sci. 2023 May 12;9:e1373. doi: 10.7717/peerj-cs.1373. eCollection 2023.

Three-Dimensional Immersion Scanning Technique: A Scalable Low-Cost Solution for 3D Scanning Using Water-Based Fluid.

Sensors (Basel). 2023 Mar 17;23(6):3214. doi: 10.3390/s23063214.

本文引用的文献

Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone.

IEEE Trans Vis Comput Graph. 2020 Dec;26(12):3446-3456. doi: 10.1109/TVCG.2020.3023634. Epub 2020 Nov 10.

Whole Stomach 3D Reconstruction and Frame Localization From Monocular Endoscope Video.

IEEE J Transl Eng Health Med. 2019 Oct 18;7:3300310. doi: 10.1109/JTEHM.2019.2946802. eCollection 2019.

Superpixel Soup: Monocular Dense 3D Reconstruction of a Complex Dynamic Scene.

IEEE Trans Pattern Anal Mach Intell. 2021 May;43(5):1705-1717. doi: 10.1109/TPAMI.2019.2955131. Epub 2021 Apr 1.

Reconstruction of 3D Object Shape Using Hybrid Modular Neural Network Architecture Trained on 3D Models from Dataset.

Sensors (Basel). 2019 Mar 31;19(7):1553. doi: 10.3390/s19071553.

Dense Visual SLAM with Probabilistic Surfel Map.

IEEE Trans Vis Comput Graph. 2017 Nov;23(11):2389-2398. doi: 10.1109/TVCG.2017.2734458. Epub 2017 Aug 10.

Direct Sparse Odometry.

IEEE Trans Pattern Anal Mach Intell. 2018 Mar;40(3):611-625. doi: 10.1109/TPAMI.2017.2658577. Epub 2017 Apr 12.

Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields.

IEEE Trans Pattern Anal Mach Intell. 2016 Oct;38(10):2024-39. doi: 10.1109/TPAMI.2015.2505283. Epub 2015 Dec 3.

Depth Transfer: Depth Extraction from Video Using Non-Parametric Sampling.

IEEE Trans Pattern Anal Mach Intell. 2014 Nov;36(11):2144-58. doi: 10.1109/TPAMI.2014.2316835.

Trajectory Space: A Dual Representation for Nonrigid Structure from Motion.

IEEE Trans Pattern Anal Mach Intell. 2011 Jul;33(7):1442-56. doi: 10.1109/TPAMI.2010.201. Epub 2010 Nov 18.

Accurate, dense, and robust multiview stereopsis.

IEEE Trans Pattern Anal Mach Intell. 2010 Aug;32(8):1362-76. doi: 10.1109/TPAMI.2009.161.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于动态环境的基于单目视频帧的新型无传感器3D模型重建方法。

A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献