• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用语义分割和深度学习集成的多模态场景识别

Multimodal scene recognition using semantic segmentation and deep learning integration.

作者信息

Naseer Aysha, Alnusayri Mohammed, Alhasson Haifa F, Alatiyyah Mohammed, AlHammadi Dina Abdulaziz, Jalal Ahmad, Park Jeongmin

机构信息

Department of Computer Science, Air University, Islamabad, Pakistan.

Department of Computer Science, Jouf University, Sakaka, Saudi Arabia.

出版信息

PeerJ Comput Sci. 2025 May 14;11:e2858. doi: 10.7717/peerj-cs.2858. eCollection 2025.

DOI:10.7717/peerj-cs.2858
PMID:40567764
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12192964/
Abstract

Semantic modeling and recognition of indoor scenes present a significant challenge due to the complex composition of generic scenes, which contain a variety of features including themes and objects, makes semantic modeling and indoor scene recognition difficult. The gap between high-level scene interpretation and low-level visual features increases the complexity of scene recognition. In order to overcome these obstacles, this study presents a novel multimodal deep learning technique that enhances scene recognition accuracy and robustness by combining depth information with conventional red-green-blue (RGB) image data. Convolutional neural networks (CNNs) and spatial pyramid pooling (SPP) are used for analysis after a depth-aware segmentation methodology is used to identify several objects in an image. This allows for more precise image classification. The effectiveness of this method is demonstrated by experimental findings, which show 91.73% accuracy on the RGB-D scene dataset and 90.53% accuracy on the NYU Depth v2 dataset. These results demonstrate how the multimodal approach can improve scene detection and classification, with potential uses in fields including robotics, sports analysis, and security systems.

摘要

由于通用场景的复杂构成,室内场景的语义建模和识别面临重大挑战,通用场景包含各种特征,包括主题和物体,这使得语义建模和室内场景识别变得困难。高级场景解释与低级视觉特征之间的差距增加了场景识别的复杂性。为了克服这些障碍,本研究提出了一种新颖的多模态深度学习技术,该技术通过将深度信息与传统的红绿蓝(RGB)图像数据相结合来提高场景识别的准确性和鲁棒性。在使用深度感知分割方法识别图像中的多个物体后,使用卷积神经网络(CNN)和空间金字塔池化(SPP)进行分析。这使得图像分类更加精确。实验结果证明了该方法的有效性,在RGB-D场景数据集上的准确率为91.73%,在NYU Depth v2数据集上的准确率为90.53%。这些结果表明了多模态方法如何能够改善场景检测和分类,在机器人技术、体育分析和安全系统等领域具有潜在应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/7775188d992b/peerj-cs-11-2858-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/12ee4ccb6766/peerj-cs-11-2858-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/0489457614d0/peerj-cs-11-2858-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/d28f3b7be4b7/peerj-cs-11-2858-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/cc2874bd1303/peerj-cs-11-2858-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/ce1dfbad06b1/peerj-cs-11-2858-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/1612ae098338/peerj-cs-11-2858-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/7775188d992b/peerj-cs-11-2858-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/12ee4ccb6766/peerj-cs-11-2858-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/0489457614d0/peerj-cs-11-2858-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/d28f3b7be4b7/peerj-cs-11-2858-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/cc2874bd1303/peerj-cs-11-2858-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/ce1dfbad06b1/peerj-cs-11-2858-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/1612ae098338/peerj-cs-11-2858-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4627/12192964/7775188d992b/peerj-cs-11-2858-g007.jpg

相似文献

1
Multimodal scene recognition using semantic segmentation and deep learning integration.使用语义分割和深度学习集成的多模态场景识别
PeerJ Comput Sci. 2025 May 14;11:e2858. doi: 10.7717/peerj-cs.2858. eCollection 2025.
2
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
3
Skin-CAD: Explainable deep learning classification of skin cancer from dermoscopic images by feature selection of dual high-level CNNs features and transfer learning.皮肤 CAD:基于双高级 CNN 特征选择和迁移学习的皮肤镜图像皮肤癌可解释深度学习分类。
Comput Biol Med. 2024 Aug;178:108798. doi: 10.1016/j.compbiomed.2024.108798. Epub 2024 Jun 25.
4
Exploring the Potential of Electroencephalography Signal-Based Image Generation Using Diffusion Models: Integrative Framework Combining Mixed Methods and Multimodal Analysis.利用扩散模型探索基于脑电图信号的图像生成潜力:结合混合方法和多模态分析的综合框架
JMIR Med Inform. 2025 Jun 25;13:e72027. doi: 10.2196/72027.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略
Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.
7
SODU2-NET: a novel deep learning-based approach for salient object detection utilizing U-NET.SODU2-NET:一种基于深度学习的利用U-NET进行显著目标检测的新方法。
PeerJ Comput Sci. 2025 May 19;11:e2623. doi: 10.7717/peerj-cs.2623. eCollection 2025.
8
Scene complexity and the detail trace of human long-term visual memory.场景复杂性与人类长期视觉记忆的细节痕迹
Vision Res. 2025 Feb;227:108525. doi: 10.1016/j.visres.2024.108525. Epub 2024 Dec 6.
9
Assessing the comparative effects of interventions in COPD: a tutorial on network meta-analysis for clinicians.评估慢性阻塞性肺疾病干预措施的比较效果:面向临床医生的网状Meta分析教程
Respir Res. 2024 Dec 21;25(1):438. doi: 10.1186/s12931-024-03056-x.
10
Molecular feature-based classification of retroperitoneal liposarcoma: a prospective cohort study.基于分子特征的腹膜后脂肪肉瘤分类:一项前瞻性队列研究。
Elife. 2025 May 23;14:RP100887. doi: 10.7554/eLife.100887.

本文引用的文献

1
Surface Reconstruction From Point Clouds: A Survey and a Benchmark.
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):9727-9748. doi: 10.1109/TPAMI.2024.3429209. Epub 2024 Nov 6.
2
Graph-Based Region and Boundary Aggregation for Biomedical Image Segmentation.基于图的区域和边界聚合的生物医学图像分割。
IEEE Trans Med Imaging. 2022 Mar;41(3):690-701. doi: 10.1109/TMI.2021.3123567. Epub 2022 Mar 2.
3
Depth-Image Segmentation Based on Evolving Principles for 3D Sensing of Structured Indoor Environments.基于进化原理的深度图像分割用于结构化室内环境的3D传感
Sensors (Basel). 2021 Jun 27;21(13):4395. doi: 10.3390/s21134395.
4
Does image normalization and intensity resolution impact texture classification?图像归一化和强度分辨率是否会影响纹理分类?
Comput Med Imaging Graph. 2020 Apr;81:101716. doi: 10.1016/j.compmedimag.2020.101716. Epub 2020 Mar 6.
5
From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network.从点到部件:基于部件感知与部件聚合网络的点云三维目标检测
IEEE Trans Pattern Anal Mach Intell. 2021 Aug;43(8):2647-2664. doi: 10.1109/TPAMI.2020.2977026. Epub 2021 Jul 1.
6
Improved Saliency Detection in RGB-D Images Using Two-phase Depth Estimation and Selective Deep Fusion.基于两阶段深度估计和选择性深度融合的RGB-D图像显著目标检测改进方法
IEEE Trans Image Process. 2020 Jan 30. doi: 10.1109/TIP.2020.2968250.
7
CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion.基于卷积神经网络的跨视图迁移和多视图融合的 RGB-D 显著目标检测。
IEEE Trans Cybern. 2018 Nov;48(11):3171-3183. doi: 10.1109/TCYB.2017.2761775. Epub 2017 Oct 31.
8
Depth-Aware Salient Object Detection and Segmentation via Multiscale Discriminative Saliency Fusion and Bootstrap Learning.基于多尺度判别显著融合和自举学习的深度感知显著目标检测与分割。
IEEE Trans Image Process. 2017 Sep;26(9):4204-4216. doi: 10.1109/TIP.2017.2711277.
9
Learning of perceptual grouping for object segmentation on RGB-D data.基于RGB-D数据的目标分割感知分组学习。
J Vis Commun Image Represent. 2014 Jan;25(1):64-73. doi: 10.1016/j.jvcir.2013.04.006.