使用语义分割和深度学习集成的多模态场景识别

Multimodal scene recognition using semantic segmentation and deep learning integration.

作者信息

Naseer Aysha, Alnusayri Mohammed, Alhasson Haifa F, Alatiyyah Mohammed, AlHammadi Dina Abdulaziz, Jalal Ahmad, Park Jeongmin

机构信息

Department of Computer Science, Air University, Islamabad, Pakistan.

Department of Computer Science, Jouf University, Sakaka, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2025 May 14;11:e2858. doi: 10.7717/peerj-cs.2858. eCollection 2025.

DOI:10.7717/peerj-cs.2858

PMID:40567764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12192964/

Abstract

Semantic modeling and recognition of indoor scenes present a significant challenge due to the complex composition of generic scenes, which contain a variety of features including themes and objects, makes semantic modeling and indoor scene recognition difficult. The gap between high-level scene interpretation and low-level visual features increases the complexity of scene recognition. In order to overcome these obstacles, this study presents a novel multimodal deep learning technique that enhances scene recognition accuracy and robustness by combining depth information with conventional red-green-blue (RGB) image data. Convolutional neural networks (CNNs) and spatial pyramid pooling (SPP) are used for analysis after a depth-aware segmentation methodology is used to identify several objects in an image. This allows for more precise image classification. The effectiveness of this method is demonstrated by experimental findings, which show 91.73% accuracy on the RGB-D scene dataset and 90.53% accuracy on the NYU Depth v2 dataset. These results demonstrate how the multimodal approach can improve scene detection and classification, with potential uses in fields including robotics, sports analysis, and security systems.

摘要

由于通用场景的复杂构成，室内场景的语义建模和识别面临重大挑战，通用场景包含各种特征，包括主题和物体，这使得语义建模和室内场景识别变得困难。高级场景解释与低级视觉特征之间的差距增加了场景识别的复杂性。为了克服这些障碍，本研究提出了一种新颖的多模态深度学习技术，该技术通过将深度信息与传统的红绿蓝（RGB）图像数据相结合来提高场景识别的准确性和鲁棒性。在使用深度感知分割方法识别图像中的多个物体后，使用卷积神经网络（CNN）和空间金字塔池化（SPP）进行分析。这使得图像分类更加精确。实验结果证明了该方法的有效性，在RGB-D场景数据集上的准确率为91.73%，在NYU Depth v2数据集上的准确率为90.53%。这些结果表明了多模态方法如何能够改善场景检测和分类，在机器人技术、体育分析和安全系统等领域具有潜在应用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用语义分割和深度学习集成的多模态场景识别

Multimodal scene recognition using semantic segmentation and deep learning integration.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

使用语义分割和深度学习集成的多模态场景识别

Multimodal scene recognition using semantic segmentation and deep learning integration.

作者信息

机构信息

出版信息

相似文献

本文引用的文献