• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于手语生成的具有丰富语义的 Pyramid Semi-Autoregressive Transformer。

A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production.

机构信息

Hebei Machine Vision Engineering Research Center, School of Cyber Security and Computer, Hebei University, Baoding 071002, China.

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China.

出版信息

Sensors (Basel). 2022 Dec 8;22(24):9606. doi: 10.3390/s22249606.

DOI:10.3390/s22249606
PMID:36559975
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9785616/
Abstract

As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy.

摘要

作为一种典型的序列到序列任务,手语生成 (SLP) 旨在自动将口语句子翻译成相应的手语序列。现有的 SLP 方法可分为两类:自回归和非自回归 SLP。自回归方法由于当前输出和前一姿势之间的长期依赖关系,存在高延迟和错误积累的问题。而非自回归方法在并行解码过程中会出现重复和遗漏的问题。为了解决 SLP 中的这些问题,我们在本文中提出了一种名为 Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) 的新方法。在 PSAT-RS 中,我们首先引入了一种金字塔半自回归机制,该机制以粗到细的方式将目标序列划分为组,全局上保持自回归性质,同时局部生成目标帧。同时,采用宽松的掩蔽注意力机制,使解码器不仅可以捕捉到前一组中的姿势序列,还可以关注当前组。最后,考虑到时空信息的重要性,我们还设计了一个丰富语义嵌入 (RS) 模块,将时间维度和空间位移上的序列信息编码到相同的高维空间中。这显著提高了关节运动的协调性,使生成的手语视频更加自然。在 RWTH-PHOENIX-Weather-2014T 和 CSL 数据集上的实验结果表明,所提出的 PSAT-RS 与最先进的自回归和非自回归 SLP 模型具有竞争力,在速度和准确性之间取得了更好的折衷。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/cae281170495/sensors-22-09606-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/119dbe7e379f/sensors-22-09606-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/637b24ffd256/sensors-22-09606-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/c9110f910c01/sensors-22-09606-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/850d3045c4bb/sensors-22-09606-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/25029eb41109/sensors-22-09606-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/cae281170495/sensors-22-09606-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/119dbe7e379f/sensors-22-09606-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/637b24ffd256/sensors-22-09606-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/c9110f910c01/sensors-22-09606-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/850d3045c4bb/sensors-22-09606-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/25029eb41109/sensors-22-09606-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5f0c/9785616/cae281170495/sensors-22-09606-g006.jpg

相似文献

1
A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production.用于手语生成的具有丰富语义的 Pyramid Semi-Autoregressive Transformer。
Sensors (Basel). 2022 Dec 8;22(24):9606. doi: 10.3390/s22249606.
2
Continuous Sign Language Recognition through a Context-Aware Generative Adversarial Network.基于上下文感知生成对抗网络的连续手语识别。
Sensors (Basel). 2021 Apr 1;21(7):2437. doi: 10.3390/s21072437.
3
Cross-modal knowledge distillation for continuous sign language recognition.跨模态知识迁移在连续手语识别中的应用。
Neural Netw. 2024 Nov;179:106587. doi: 10.1016/j.neunet.2024.106587. Epub 2024 Jul 30.
4
An Improved Sign Language Translation Model with Explainable Adaptations for Processing Long Sign Sentences.一种具有可解释适应性的改进型手语翻译模型,用于处理长手语句子。
Comput Intell Neurosci. 2020 Oct 23;2020:8816125. doi: 10.1155/2020/8816125. eCollection 2020.
5
Self-Supervised Representation Learning With Spatial-Temporal Consistency for Sign Language Recognition.用于手语识别的具有时空一致性的自监督表征学习
IEEE Trans Image Process. 2024;33:4188-4201. doi: 10.1109/TIP.2024.3416881. Epub 2024 Jul 17.
6
Synthetic Corpus Generation for Deep Learning-Based Translation of Spanish Sign Language.用于基于深度学习的西班牙语手语翻译的合成语料库生成
Sensors (Basel). 2024 Feb 24;24(5):1472. doi: 10.3390/s24051472.
7
Multitask Non-Autoregressive Model for Human Motion Prediction.多任务非自回归人体运动预测模型。
IEEE Trans Image Process. 2021;30:2562-2574. doi: 10.1109/TIP.2020.3038362. Epub 2021 Feb 5.
8
Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery.基于Transformer且带有动态注意力金字塔头的甚高分辨率遥感影像语义分割模型
Entropy (Basel). 2022 Nov 6;24(11):1619. doi: 10.3390/e24111619.
9
Motion-sensitive cortex and motion semantics in American Sign Language.运动敏感皮层与美国手语的运动语义。
Neuroimage. 2012 Oct 15;63(1):111-8. doi: 10.1016/j.neuroimage.2012.06.029. Epub 2012 Jun 27.
10
Joint optimization of word alignment and epenthesis generation for Chinese to Taiwanese sign synthesis.
IEEE Trans Pattern Anal Mach Intell. 2007 Jan;29(1):28-39. doi: 10.1109/tpami.2007.250597.

本文引用的文献

1
Skeleton-based Chinese sign language recognition and generation for bidirectional communication between deaf and hearing people.基于骨架的中文手语识别与生成,实现聋听人群的双向交流。
Neural Netw. 2020 May;125:41-55. doi: 10.1016/j.neunet.2020.01.030. Epub 2020 Feb 6.