Gong Biao, Yan Chenggang, Bai Junjie, Zou Changqing, Gao Yue
IEEE Trans Image Process. 2020 Aug 5;PP. doi: 10.1109/TIP.2020.3013138.
Three-dimensional multi-modal data are used to represent 3D objects in the real world in different ways. Features separately extracted from multimodality data are often poorly correlated. Recent solutions leveraging the attention mechanism to learn a joint-network for the fusion of multimodality features have weak generalization capability. In this paper, we propose a hamming embedding sensitivity network to address the problem of effectively fusing multimodality features. The proposed network called HamNet is the first end-to-end framework with the capacity to theoretically integrate data from all modalities with a unified architecture for 3D shape representation, which can be used for 3D shape retrieval and recognition. HamNet uses the feature concealment module to achieve effective deep feature fusion. The basic idea of the concealment module is to re-weight the features from each modality at an early stage with the hamming embedding of these modalities. The hamming embedding also provides an effective solution for fast retrieval tasks on a large scale dataset. We have evaluated the proposed method on the large-scale ModelNet40 dataset for the tasks of 3D shape classification, single modality and cross-modality retrieval. Comprehensive experiments and comparisons with state-of-the-art methods demonstrate that the proposed approach can achieve superior performance.
三维多模态数据用于以不同方式表示现实世界中的三维物体。从多模态数据中分别提取的特征通常相关性较差。最近利用注意力机制学习用于融合多模态特征的联合网络的解决方案,其泛化能力较弱。在本文中,我们提出了一种汉明嵌入敏感性网络来解决有效融合多模态特征的问题。所提出的名为HamNet的网络是第一个具有理论上能够以统一架构整合来自所有模态的数据以进行三维形状表示的端到端框架,可用于三维形状检索和识别。HamNet使用特征隐藏模块来实现有效的深度特征融合。隐藏模块的基本思想是在早期阶段利用这些模态的汉明嵌入对来自每个模态的特征重新加权。汉明嵌入还为大规模数据集上的快速检索任务提供了有效的解决方案。我们已经在大规模的ModelNet40数据集上针对三维形状分类、单模态和跨模态检索任务评估了所提出的方法。与现有最先进方法的综合实验和比较表明,所提出的方法能够实现卓越的性能。