Department of Computer Science and Engineering, Incheon National University (INU), Incheon 22012, Republic of Korea.
Sensors (Basel). 2023 Mar 17;23(6):3231. doi: 10.3390/s23063231.
While machine translation for spoken language has advanced significantly, research on sign language translation (SLT) for deaf individuals remains limited. Obtaining annotations, such as gloss, can be expensive and time-consuming. To address these challenges, we propose a new sign language video-processing method for SLT without gloss annotations. Our approach leverages the signer's skeleton points to identify their movements and help build a robust model resilient to background noise. We also introduce a keypoint normalization process that preserves the signer's movements while accounting for variations in body length. Furthermore, we propose a stochastic frame selection technique to prioritize frames to minimize video information loss. Based on the attention-based model, our approach demonstrates effectiveness through quantitative experiments on various metrics using German and Korean sign language datasets without glosses.
虽然口语机器翻译已经取得了显著进展,但针对聋哑人士的手语翻译 (SLT) 的研究仍然有限。获取注释,如手语的指语,可能既昂贵又耗时。为了解决这些挑战,我们提出了一种新的无需手语指语注释的手语视频处理方法,用于 SLT。我们的方法利用手语者的骨骼点来识别他们的动作,并帮助建立一个强大的模型,以抵御背景噪声的影响。我们还引入了一个关键点归一化过程,在考虑身体长度变化的同时,保留手语者的动作。此外,我们提出了一种随机帧选择技术,通过优先选择帧来最小化视频信息丢失。基于基于注意力的模型,我们的方法通过使用无注释的德语和韩语手语数据集在各种指标上进行的定量实验证明了其有效性。