Suppr超能文献

基于骨架的动作识别的多模态自适应特征融合图卷积网络。

Multi-Modality Adaptive Feature Fusion Graph Convolutional Network for Skeleton-Based Action Recognition.

机构信息

School of Computer Science, Hangzhou Dianzi University, Hangzhou 310005, China.

School of Information Engineering, Hangzhou Dianzi University, Hangzhou 310005, China.

出版信息

Sensors (Basel). 2023 Jun 7;23(12):5414. doi: 10.3390/s23125414.

Abstract

Graph convolutional networks are widely used in skeleton-based action recognition because of their good fitting ability to non-Euclidean data. While conventional multi-scale temporal convolution uses several fixed-size convolution kernels or dilation rates at each layer of the network, we argue that different layers and datasets require different receptive fields. We use multi-scale adaptive convolution kernels and dilation rates to optimize traditional multi-scale temporal convolution with a simple and effective self attention mechanism, allowing different network layers to adaptively select convolution kernels of different sizes and dilation rates instead of being fixed and unchanged. Besides, the effective receptive field of the simple residual connection is not large, and there is a great deal of redundancy in the deep residual network, which will lead to the loss of context when aggregating spatio-temporal information. This article introduces a feature fusion mechanism that replaces the residual connection between initial features and temporal module outputs, effectively solving the problems of context aggregation and initial feature fusion. We propose a multi-modality adaptive feature fusion framework (MMAFF) to simultaneously increase the receptive field in both spatial and temporal dimensions. Concretely, we input the features extracted by the spatial module into the adaptive temporal fusion module to simultaneously extract multi-scale skeleton features in both spatial and temporal parts. In addition, based on the current multi-stream approach, we use the limb stream to uniformly process correlated data from multiple modalities. Extensive experiments show that our model obtains competitive results with state-of-the-art methods on the NTU-RGB+D 60 and NTU-RGB+D 120 datasets.

摘要

图卷积网络由于其对非欧几里得数据的良好拟合能力,在基于骨架的动作识别中得到了广泛的应用。传统的多尺度时间卷积在网络的每一层使用几个固定大小的卷积核或扩张率,而我们认为不同的层和数据集需要不同的感受野。我们使用多尺度自适应卷积核和扩张率,结合简单而有效的自注意力机制来优化传统的多尺度时间卷积,使不同的网络层能够自适应地选择不同大小和扩张率的卷积核,而不是固定不变。此外,简单残差连接的有效感受野不大,深度残差网络中存在大量冗余,在聚合时空信息时会导致上下文丢失。本文引入了一种特征融合机制,用其替代初始特征和时间模块输出之间的残差连接,有效地解决了上下文聚合和初始特征融合的问题。我们提出了一种多模态自适应特征融合框架(MMAFF),同时增加了空间和时间维度上的感受野。具体来说,我们将空间模块提取的特征输入到自适应时间融合模块中,以同时提取空间和时间部分的多尺度骨架特征。此外,基于当前的多流方法,我们使用肢体流对来自多个模态的相关数据进行统一处理。大量实验表明,我们的模型在 NTU-RGB+D60 和 NTU-RGB+D120 数据集上的性能与最先进的方法相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/16be/10303820/6fc63e4ef769/sensors-23-05414-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验