Liu Wenxuan, Zhong Xian, Zhou Zhuo, Jiang Kui, Wang Zheng, Lin Chia-Wen
IEEE Trans Image Process. 2023;32:2719-2733. doi: 10.1109/TIP.2023.3273459. Epub 2023 May 16.
Multi-view action recognition aims to identify action categories from given clues. Existing studies ignore the negative influences of fuzzy views between view and action in disentangling, commonly arising the mistaken recognition results. To this end, we regard the observed image as the composition of the view and action components, and give full play to the advantages of multiple views via the adaptive cooperative representation among these two components, forming a Dual-Recommendation Disentanglement Network (DRDN) for multi-view action recognition. Specifically, 1) For the action, we leverage a multi-level Specific Information Recommendation (SIR) to enhance the interaction among intricate activities and views. SIR offers a more comprehensive representation of activities, measuring the trade-off between global and local information. 2) For the view, we utilize a Pyramid Dynamic Recommendation (PDR) to learn a complete and detailed global representation by transferring features from different views. It is explicitly restricted to resist the fuzzy noise influence, focusing on positive knowledge from other views. Our DRDN aims for complete action and view representation, where PDR directly guides action to disentangle with view features and SIR considers mutual exclusivity of view and action clues. Extensive experiments have indicated that the multi-view action recognition method DRDN we proposed achieves state-of-the-art performance over powerful competitors on several standard benchmarks. The code will be available at https://github.com/51cloud/DRDN.
多视图动作识别旨在从给定线索中识别动作类别。现有研究在解缠过程中忽略了视图与动作之间模糊视图的负面影响,这通常会导致错误的识别结果。为此,我们将观察到的图像视为视图和动作成分的组合,并通过这两个成分之间的自适应协作表示充分发挥多视图的优势,形成用于多视图动作识别的双推荐解缠网络(DRDN)。具体来说,1)对于动作,我们利用多级特定信息推荐(SIR)来增强复杂活动和视图之间的交互。SIR提供了更全面的活动表示,衡量全局和局部信息之间的权衡。2)对于视图,我们利用金字塔动态推荐(PDR)通过从不同视图转移特征来学习完整而详细的全局表示。它被明确限制以抵抗模糊噪声的影响,专注于来自其他视图的积极知识。我们的DRDN旨在实现完整的动作和视图表示,其中PDR直接指导动作与视图特征解缠,而SIR考虑视图和动作线索的互斥性。大量实验表明,我们提出的多视图动作识别方法DRDN在几个标准基准上优于强大的竞争对手,取得了领先的性能。代码将在https://github.com/51cloud/DRDN上提供。