一种结合卷积神经网络和基于文本特征的室内场景识别综合混合方法。

A Comprehensive Hybrid Approach for Indoor Scene Recognition Combining CNNs and Text-Based Features.

作者信息

Uckan Taner, Aslan Cengiz, Hark Cengiz

机构信息

Department of Computer Engineering, Faculty of Engineering, Van Yuzuncu Yıl University, Van 65080, Turkey.

Department of Artificial Intelligence and Robotics, Van Yuzuncu Yıl University, Van 65080, Turkey.

出版信息

Sensors (Basel). 2025 Aug 29;25(17):5350. doi: 10.3390/s25175350.

DOI:10.3390/s25175350

PMID:40942779

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12430972/

Abstract

Indoor scene recognition is a computer vision task that identifies various indoor environments, such as offices, libraries, kitchens, and restaurants. This research area is particularly significant for applications in robotics, security, and assistance for individuals with disabilities, as it enables the categorization of spaces and the provision of contextual information. Convolutional Neural Networks (CNNs) are commonly employed in this field. While CNNs perform well in outdoor scene recognition by focusing on global features such as mountains and skies, they often struggle with indoor scenes, where local features like furniture and objects are more critical. In this study, the "MIT 67 Indoor Scene" dataset is used to extract and combine features from both a CNN and a text-based model utilizing object recognition outputs, resulting in a two-channel hybrid model. The experimental results demonstrate that this hybrid approach, which integrates natural language processing and image processing techniques, improves the test accuracy of the image processing model by 8.3%, achieving a notable success rate. Furthermore, this study offers contributions to new application areas in remote sensing, particularly in indoor scene understanding and indoor mapping.

摘要

室内场景识别是一项计算机视觉任务，旨在识别各种室内环境，如办公室、图书馆、厨房和餐厅。该研究领域对于机器人技术、安全以及为残疾人提供辅助等应用尤为重要，因为它能够对空间进行分类并提供上下文信息。卷积神经网络（CNN）在该领域中被广泛应用。虽然CNN通过关注山脉和天空等全局特征在室外场景识别中表现出色，但它们在室内场景中往往面临困难，因为家具和物体等局部特征在室内场景中更为关键。在本研究中，“麻省理工学院67类室内场景”数据集被用于利用目标识别输出从CNN和基于文本的模型中提取并组合特征，从而得到一个双通道混合模型。实验结果表明，这种融合了自然语言处理和图像处理技术的混合方法将图像处理模型的测试准确率提高了8.3%，取得了显著的成功率。此外，本研究为遥感新应用领域做出了贡献，特别是在室内场景理解和室内地图绘制方面。