对齐增强型交互式融合模型用于完整和不完整多模态手势识别。

Hand gesture recognition (HGR) based on surface electromyogram (sEMG) and Accelerometer (ACC) signals is increasingly attractive where fusion strategies are crucial for performance and remain challenging. Currently, neural network-based fusion methods have gained superior performance. Nevertheless, these methods typically fuse sEMG and ACC either in the early or late stages, overlooking the integration of entire cross-modal hierarchical information within each individual hidden layer, thus inducing inefficient inter-modal fusion. To this end, we propose a novel Alignment-Enhanced Interactive Fusion (AiFusion) model, which achieves effective fusion via a progressive hierarchical fusion strategy. Notably, AiFusion can flexibly perform both complete and incomplete multimodal HGR. Specifically, AiFusion contains two unimodal branches and a cascaded transformer-based multimodal fusion branch. The fusion branch is first designed to adequately characterize modality-interactive knowledge by adaptively capturing inter-modal similarity and fusing hierarchical features from all branches layer by layer. Then, the modality-interactive knowledge is aligned with that of unimodality using cross-modal supervised contrastive learning and online distillation from embedding and probability spaces respectively. These alignments further promote fusion quality and refine modality-specific representations. Finally, the recognition outcomes are set to be determined by available modalities, thus contributing to handling the incomplete multimodal HGR problem, which is frequently encountered in real-world scenarios. Experimental results on five public datasets demonstrate that AiFusion outperforms most state-of-the-art benchmarks in complete multimodal HGR. Impressively, it also surpasses the unimodal baselines in the challenging incomplete multimodal HGR. The proposed AiFusion provides a promising solution to realize effective and robust multimodal HGR-based interfaces.

基于表面肌电 (sEMG) 和加速度计 (ACC) 信号的手势识别 (HGR) 越来越受到关注，其中融合策略对于性能至关重要且具有挑战性。目前，基于神经网络的融合方法已经取得了优异的性能。然而，这些方法通常在早期或晚期融合 sEMG 和 ACC，忽略了在每个单独的隐藏层内整合整个跨模态层次信息，从而导致模态间融合效率低下。为此，我们提出了一种新颖的对齐增强交互融合 (AiFusion) 模型，该模型通过渐进式层次融合策略实现有效融合。值得注意的是，AiFusion 可以灵活地执行完全和不完全的多模态 HGR。具体来说，AiFusion 包含两个单模态分支和一个级联基于转换器的多模态融合分支。融合分支首先通过自适应地捕获模态间相似性并逐层融合来自所有分支的层次特征，来充分表征模态间交互知识。然后，使用跨模态监督对比学习和来自嵌入和概率空间的在线蒸馏，将模态间交互知识与单模态对齐。这些对齐进一步促进了融合质量和模态特定表示的细化。最后，识别结果由可用的模态确定，从而有助于处理现实场景中经常遇到的不完全多模态 HGR 问题。在五个公共数据集上的实验结果表明，AiFusion 在完全多模态 HGR 方面优于大多数最先进的基准，令人印象深刻的是，它在具有挑战性的不完全多模态 HGR 方面也超过了单模态基线。所提出的 AiFusion 为实现有效的、稳健的基于多模态 HGR 的接口提供了一种有前景的解决方案。

Alignment-Enhanced Interactive Fusion Model for Complete and Incomplete Multimodal Hand Gesture Recognition.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献