Cheng Shuli, Wang Liejun, Du Anyu
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
Key Laboratory of Signal Detection and Processing, Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi 830046, China.
Entropy (Basel). 2020 Nov 7;22(11):1266. doi: 10.3390/e22111266.
Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.
深度哈希由于其检索速度快和存储容量低,是大规模跨模态检索的主流算法,但模态语义信息的重建问题仍然非常具有挑战性。为了进一步解决无监督跨模态检索语义重建问题,我们提出了一种新颖的深度语义保持重建哈希(DSPRH)算法。该算法结合了空间和通道语义信息,并基于自适应自编码和联合语义重建损失挖掘模态语义信息。主要贡献如下:(1)我们引入了一种基于张量正则多态分解理论的新的空间池化网络模块,以生成秩-1张量来捕获高阶上下文语义,这可以帮助骨干网络捕获重要的上下文模态语义信息。(2)从优化的角度出发,我们使用全局协方差池化来捕获通道语义信息并加速网络收敛。在特征重建层,我们使用两个瓶颈自编码来实现视觉-文本模态交互。(3)在度量学习中,我们设计了一种新的损失函数来优化模型参数,该函数可以保持图像模态和文本模态之间的相关性。DSPRH算法在MIRFlickr-25K和NUS-WIDE上进行了测试。实验结果表明,DSPRH在检索任务上取得了更好的性能。