基于空间注意力的 3D 图卷积神经网络的手语识别。

Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition.

机构信息

Centre of Smart Robotics Research (CS2R), King Saud University, Riyadh 11543, Saudi Arabia.

Department of Civil and Environmental Engineering, Faculty of Engineering, Norwegian University of Science and Technology, Høgskoleringen 1, 7034 Trondheim, Norway.

出版信息

Sensors (Basel). 2022 Jun 16;22(12):4558. doi: 10.3390/s22124558.

DOI:10.3390/s22124558

PMID:35746341

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9227856/

Abstract

Sign language is the main channel for hearing-impaired people to communicate with others. It is a visual language that conveys highly structured components of manual and non-manual parameters such that it needs a lot of effort to master by hearing people. Sign language recognition aims to facilitate this mastering difficulty and bridge the communication gap between hearing-impaired people and others. This study presents an efficient architecture for sign language recognition based on a convolutional graph neural network (GCN). The presented architecture consists of a few separable 3DGCN layers, which are enhanced by a spatial attention mechanism. The limited number of layers in the proposed architecture enables it to avoid the common over-smoothing problem in deep graph neural networks. Furthermore, the attention mechanism enhances the spatial context representation of the gestures. The proposed architecture is evaluated on different datasets and shows outstanding results.

摘要

手语是听障人士与他人交流的主要渠道。它是一种视觉语言，传达了高度结构化的手语和非手语参数，因此听障人士需要付出很多努力才能掌握。手语识别旨在帮助他们克服这一困难，弥合听障人士与其他人之间的沟通障碍。本研究提出了一种基于卷积图神经网络 (GCN) 的手语识别高效架构。所提出的架构由几个可分离的 3DGCN 层组成，这些层通过空间注意力机制得到增强。该架构中的层数量有限，使其能够避免深度图神经网络中常见的过度平滑问题。此外，注意力机制增强了手势的空间上下文表示。所提出的架构在不同的数据集上进行了评估，结果非常出色。