Suppr超能文献

融合视觉:一种使用YOLO和快速分割一切模型从RGB-D相机进行3D物体重建与分割的综合方法。

FusionVision: A Comprehensive Approach of 3D Object Reconstruction and Segmentation from RGB-D Cameras Using YOLO and Fast Segment Anything.

作者信息

El Ghazouali Safouane, Mhirit Youssef, Oukhrid Ali, Michelucci Umberto, Nouira Hichem

机构信息

TOELT LLC, AI Lab, 8406 Winterthur, Switzerland.

Independent Researcher, 75000 Paris, France.

出版信息

Sensors (Basel). 2024 Apr 30;24(9):2889. doi: 10.3390/s24092889.

Abstract

In the realm of computer vision, the integration of advanced techniques into the pre-processing of RGB-D camera inputs poses a significant challenge, given the inherent complexities arising from diverse environmental conditions and varying object appearances. Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery. Traditional computer vision systems face limitations in simultaneously capturing precise object boundaries and achieving high-precision object detection on depth maps, as they are mainly proposed for RGB cameras. To address this challenge, FusionVision adopts an integrated approach by merging state-of-the-art object detection techniques, with advanced instance segmentation methods. The integration of these components enables a holistic (unified analysis of information obtained from both color and depth channels) interpretation of RGB-D data, facilitating the extraction of comprehensive and accurate object information in order to improve post-processes such as object 6D pose estimation, Simultanious Localization and Mapping (SLAM) operations, accurate 3D dataset extraction, etc. The proposed FusionVision pipeline employs YOLO for identifying objects within the RGB image domain. Subsequently, FastSAM, an innovative semantic segmentation model, is applied to delineate object boundaries, yielding refined segmentation masks. The synergy between these components and their integration into 3D scene understanding ensures a cohesive fusion of object detection and segmentation, enhancing overall precision in 3D object segmentation.

摘要

在计算机视觉领域,鉴于不同环境条件和物体外观变化所带来的内在复杂性,将先进技术集成到RGB-D相机输入的预处理中面临着重大挑战。因此,本文介绍了FusionVision,这是一种适用于RGB-D图像中物体稳健3D分割的详尽流程。传统的计算机视觉系统在同时捕捉精确的物体边界和在深度图上实现高精度物体检测方面存在局限性,因为它们主要是为RGB相机设计的。为应对这一挑战,FusionVision采用了一种集成方法,将先进的目标检测技术与先进的实例分割方法相结合。这些组件的集成能够对RGB-D数据进行整体(对从颜色和深度通道获得的信息进行统一分析)解释,便于提取全面准确的物体信息,以改进诸如物体6D姿态估计、同时定位与地图构建(SLAM)操作、准确的3D数据集提取等后处理过程。所提出的FusionVision流程使用YOLO在RGB图像域中识别物体。随后,应用创新的语义分割模型FastSAM来描绘物体边界,生成精细的分割掩码。这些组件之间的协同作用以及它们集成到3D场景理解中,确保了物体检测和分割的紧密融合,提高了3D物体分割的整体精度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2026/11086350/00cb3b006083/sensors-24-02889-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验