IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6605-6617. doi: 10.1109/TPAMI.2020.3015894. Epub 2023 May 5.
In this paper, we propose to tackle egocentric action recognition by suppressing background distractors and enhancing action-relevant interactions. The existing approaches usually utilize two independent branches to recognize egocentric actions, i.e., a verb branch and a noun branch. However, the mechanism to suppress distracting objects and exploit local human-object correlations is missing. To this end, we introduce two extra sources of information, i.e., the candidate objects spatial location and their discriminative features, to enable concentration on the occurring interactions. We design a Symbiotic Attention with Object-centric feature Alignment framework (SAOA) to provide meticulous reasoning between the actor and the environment. First, we introduce an object-centric feature alignment method to inject the local object features to the verb branch and noun branch. Second, we propose a symbiotic attention mechanism to encourage the mutual interaction between the two branches and select the most action-relevant candidates for classification. The framework benefits from the communication among the verb branch, the noun branch, and the local object information. Experiments based on different backbones and modalities demonstrate the effectiveness of our method. Notably, our framework achieves the state-of-the-art on the largest egocentric video dataset.
在本文中,我们提出通过抑制背景干扰和增强与动作相关的交互来解决自我中心动作识别问题。现有的方法通常使用两个独立的分支来识别自我中心动作,即动词分支和名词分支。然而,缺少抑制干扰物体和利用局部人与物体相关性的机制。为此,我们引入了两个额外的信息源,即候选物体的空间位置及其判别特征,以实现对发生的交互的关注。我们设计了一种具有目标中心特征对齐的共生注意力框架(SAOA),以提供演员和环境之间的细致推理。首先,我们引入了一种目标中心特征对齐方法,将局部物体特征注入到动词分支和名词分支中。其次,我们提出了一种共生注意力机制,以鼓励两个分支之间的相互作用,并选择最相关的动作候选者进行分类。该框架受益于动词分支、名词分支和局部物体信息之间的通信。基于不同的骨干网络和模态的实验证明了我们方法的有效性。值得注意的是,我们的框架在最大的自我中心视频数据集上达到了最先进的水平。