• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用卷积神经网络(CNN)和循环神经网络(RNN)架构进行手语识别的高效时空建模。

Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures.

作者信息

Myagila Kasian, Nyambo Devotha Godfrey, Dida Mussa Ally

机构信息

School of Computation and Communication Science and Engineering, The Nelson Mandela African Institution of Science and Technology, Arusha, Tanzania.

Faculty of Science and Technology, Mzumbe University, Morogoro, Tanzania.

出版信息

Front Artif Intell. 2025 Aug 25;8:1630743. doi: 10.3389/frai.2025.1630743. eCollection 2025.

DOI:10.3389/frai.2025.1630743
PMID:40927705
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12415044/
Abstract

Computer vision has been identified as one of the solutions to bridge communication barriers between speech-impaired populations and those without impairment as most people are unaware of the sign language used by speech-impaired individuals. Numerous studies have been conducted to address this challenge. However, recognizing word signs, which are usually dynamic and involve more than one frame per sign, remains a challenge. This study used Tanzania Sign Language datasets collected using mobile phone selfie cameras to investigate the performance of deep learning algorithms that capture spatial and temporal relationships features of video frames. The study used CNN-LSTM and CNN-GRU architectures, where CNN-GRU with an ELU activation function is proposed to enhance learning efficiency and performance. The findings indicate that the proposed CNN-GRU model with ELU activation achieved an accuracy of 94%, compared to 93% for the standard CNN-GRU model and CNN-LSTM. In addition, the study evaluated performance of the proposed model in a signer-independent setting, where the results varied significantly across individual signers, with the highest accuracy reaching 66%. These results show that more effort is required to improve signer independence performance, including the challenges of hand dominance by optimizing spatial features.

摘要

计算机视觉已被视为解决语言障碍人群与非语言障碍人群之间沟通障碍的解决方案之一,因为大多数人不了解语言障碍者使用的手语。为应对这一挑战,已经开展了大量研究。然而,识别单词手势(通常是动态的,每个手势涉及多个帧)仍然是一项挑战。本研究使用通过手机自拍相机收集的坦桑尼亚手语数据集,来研究捕捉视频帧空间和时间关系特征的深度学习算法的性能。该研究使用了CNN-LSTM和CNN-GRU架构,其中提出了具有ELU激活函数的CNN-GRU以提高学习效率和性能。研究结果表明,所提出的具有ELU激活的CNN-GRU模型的准确率达到了94%,而标准CNN-GRU模型和CNN-LSTM的准确率为93%。此外,该研究在独立于手语者的环境中评估了所提出模型的性能,结果在不同手语者之间差异很大,最高准确率达到66%。这些结果表明,需要付出更多努力来提高独立于手语者的性能,包括通过优化空间特征来应对手的优势问题带来的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/b063fb2cca56/frai-08-1630743-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/2a07014c5c46/frai-08-1630743-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/59734ebac307/frai-08-1630743-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/0738133ba951/frai-08-1630743-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/b0ac284ec003/frai-08-1630743-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/b063fb2cca56/frai-08-1630743-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/2a07014c5c46/frai-08-1630743-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/59734ebac307/frai-08-1630743-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/0738133ba951/frai-08-1630743-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/b0ac284ec003/frai-08-1630743-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aba6/12415044/b063fb2cca56/frai-08-1630743-g0005.jpg

相似文献

1
Efficient spatio-temporal modeling for sign language recognition using CNN and RNN architectures.使用卷积神经网络(CNN)和循环神经网络(RNN)架构进行手语识别的高效时空建模。
Front Artif Intell. 2025 Aug 25;8:1630743. doi: 10.3389/frai.2025.1630743. eCollection 2025.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Short-Term Memory Impairment短期记忆障碍
4
Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review.使用肌电图信号的手语识别:系统文献综述。
Sensors (Basel). 2023 Oct 9;23(19):8343. doi: 10.3390/s23198343.
5
DDoS classification of network traffic in software defined networking SDN using a hybrid convolutional and gated recurrent neural network.使用混合卷积门控循环神经网络对软件定义网络(SDN)中的网络流量进行分布式拒绝服务(DDoS)分类。
Sci Rep. 2025 Aug 9;15(1):29122. doi: 10.1038/s41598-025-13754-1.
6
Cognitive decline assessment using semantic linguistic content and transformer deep learning architecture.使用语义语言内容和变压器深度学习架构评估认知能力下降。
Int J Lang Commun Disord. 2024 May-Jun;59(3):1110-1127. doi: 10.1111/1460-6984.12973. Epub 2023 Nov 16.
7
Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.利用晚期癌症患者腹部和骨盆 CT 图像建立卷积神经网络模型预测股骨近端病理性骨折的研究
Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.
8
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
9
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
10
Deep learning networks based decision fusion model of EEG and fNIRS for classification of cognitive tasks.基于深度学习网络的脑电图和功能近红外光谱用于认知任务分类的决策融合模型
Cogn Neurodyn. 2024 Aug;18(4):1489-1506. doi: 10.1007/s11571-023-09986-4. Epub 2023 Jun 30.

本文引用的文献

1
Exploring the efficacy of GRU model in classifying the signal to noise ratio of microgrid model.探索门控循环单元(GRU)模型在微电网模型信噪比分类中的有效性。
Sci Rep. 2024 Jul 6;14(1):15591. doi: 10.1038/s41598-024-66387-1.
2
Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model.基于混合深度学习模型的实时阿拉伯手语识别
Sensors (Basel). 2024 Jun 6;24(11):3683. doi: 10.3390/s24113683.
3
Dynamic gesture recognition based on 2D convolutional neural network and feature fusion.基于二维卷积神经网络和特征融合的动态手势识别。
Sci Rep. 2022 Mar 14;12(1):4345. doi: 10.1038/s41598-022-08133-z.
4
American Sign Language Words Recognition of Skeletal Videos Using Processed Video Driven Multi-Stacked Deep LSTM.基于处理视频驱动的多层堆叠深度 LSTM 的骨骼视频的美国手语词识别。
Sensors (Basel). 2022 Feb 11;22(4):1406. doi: 10.3390/s22041406.
5
Deep physical neural networks trained with backpropagation.基于反向传播算法训练的深度物理神经网络。
Nature. 2022 Jan;601(7894):549-555. doi: 10.1038/s41586-021-04223-6. Epub 2022 Jan 26.
6
A hybrid CNN-LSTM model for pre-miRNA classification.用于 miRNA 前体分类的混合 CNN-LSTM 模型。
Sci Rep. 2021 Jul 8;11(1):14125. doi: 10.1038/s41598-021-93656-0.
7
Machine Learning: Algorithms, Real-World Applications and Research Directions.机器学习:算法、实际应用与研究方向。
SN Comput Sci. 2021;2(3):160. doi: 10.1007/s42979-021-00592-x. Epub 2021 Mar 22.
8
The relationship between sign production and sign comprehension: What handedness reveals.手势产生与手势理解之间的关系:利手揭示了什么。
Cognition. 2017 Jul;164:144-149. doi: 10.1016/j.cognition.2017.03.019. Epub 2017 Apr 17.