Suppr超能文献

基于视觉注意力的多模态材料分类

Multimodal Material Classification Using Visual Attention.

作者信息

Maleki Mohadeseh, Rouhafzay Ghazal, Cretu Ana-Maria

机构信息

Department of Computer Science and Engineering, Université du Québec en Outaouais, Gatineau, QC J8X 3X7, Canada.

Department of Computer Science, Université du Moncton, Moncton, NB E1A 3E9, Canada.

出版信息

Sensors (Basel). 2024 Nov 29;24(23):7664. doi: 10.3390/s24237664.

Abstract

The material of an object is an inherent property that can be perceived through various sensory modalities, yet the integration of multisensory information substantially improves the accuracy of these perceptions. For example, differentiating between a ceramic and a plastic cup with similar visual properties may be difficult when relying solely on visual cues. However, the integration of touch and audio feedback when interacting with these objects can significantly clarify these distinctions. Similarly, combining audio and touch exploration with visual guidance can optimize the sensory examination process. In this study, we introduce a multisensory approach for categorizing object materials by integrating visual, audio, and touch perceptions. The main contribution of this paper is the exploration of a computational model of visual attention that directs the sampling of touch and audio data. We conducted experiments using a subset of 63 household objects from a publicly available dataset, the ObjectFolder dataset. Our findings indicate that incorporating a visual attention model enhances the ability to generalize material classifications to new objects and achieves superior performance compared to a baseline approach, where data are gathered through random interactions with an object's surface.

摘要

物体的材质是一种固有属性,可通过多种感官模态感知,然而多感官信息的整合能显著提高这些感知的准确性。例如,仅依靠视觉线索时,区分具有相似视觉属性的陶瓷杯和塑料杯可能会很困难。但是,在与这些物体交互时整合触觉和音频反馈可以显著厘清这些区别。同样,将音频和触觉探索与视觉引导相结合可以优化感官检查过程。在本研究中,我们引入一种通过整合视觉、音频和触觉感知来对物体材质进行分类的多感官方法。本文的主要贡献是探索一种视觉注意力计算模型,该模型指导触觉和音频数据的采样。我们使用来自公开可用数据集ObjectFolder数据集中的63个家用物品子集进行了实验。我们的研究结果表明,与通过与物体表面随机交互收集数据的基线方法相比,纳入视觉注意力模型增强了将材质分类推广到新物体的能力,并实现了更优的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e42/11644879/00fccc56e1fe/sensors-24-07664-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验