• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用混合变压器模型高效准确地识别美国手语手势。

Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.

作者信息

Aly Mohammed, Fathi Islam S

机构信息

Department of Artificial Intelligence, Faculty of Artificial Intelligence, Egyptian Russian University, Badr City, 11829, Egypt.

Department of Computer Science, Faculty of Information Technology, Ajloun National University, P. O. 43, Ajloun, 26810, Jordan.

出版信息

Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.

DOI:10.1038/s41598-025-06344-8
PMID:40550837
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12185765/
Abstract

Gesture recognition plays a vital role in computer vision, especially for interpreting sign language and enabling human-computer interaction. Many existing methods struggle with challenges like heavy computational demands, difficulty in understanding long-range relationships, sensitivity to background noise, and poor performance in varied environments. While CNNs excel at capturing local details, they often miss the bigger picture. Vision Transformers, on the other hand, are better at modeling global context but usually require significantly more computational resources, limiting their use in real-time systems. To tackle these issues, we propose a Hybrid Transformer-CNN model that combines the strengths of both architectures. Our approach begins with CNN layers that extract detailed local features from both the overall hand and specific hand regions. These CNN features are then refined by a Vision Transformer module, which captures long-range dependencies and global contextual information within the gesture. This integration allows the model to effectively recognize subtle hand movements while maintaining computational efficiency. Tested on the ASL Alphabet dataset, our model achieves a high accuracy of 99.97%, runs at 110 frames per second, and requires only 5.0 GFLOPs-much less than traditional Vision Transformer models, which need over twice the computational power. Central to this success is our feature fusion strategy using element-wise multiplication, which helps the model focus on important gesture details while suppressing background noise. Additionally, we employ advanced data augmentation techniques and a training approach incorporating contrastive learning and domain adaptation to boost robustness. Overall, this work offers a practical and powerful solution for gesture recognition, striking an optimal balance between accuracy, speed, and efficiency-an important step toward real-world applications.

摘要

手势识别在计算机视觉中起着至关重要的作用,特别是在解读手语和实现人机交互方面。许多现有方法面临着诸如计算需求大、理解远距离关系困难、对背景噪声敏感以及在各种环境中性能不佳等挑战。虽然卷积神经网络(CNNs)擅长捕捉局部细节,但它们往往忽略了整体情况。另一方面,视觉Transformer在建模全局上下文方面表现更好,但通常需要更多的计算资源,这限制了它们在实时系统中的应用。为了解决这些问题,我们提出了一种混合Transformer-CNN模型,该模型结合了两种架构的优势。我们的方法首先通过CNN层从整个手部和特定手部区域提取详细的局部特征。然后,这些CNN特征由视觉Transformer模块进行细化,该模块捕捉手势中的远距离依赖关系和全局上下文信息。这种整合使模型能够有效地识别细微的手部动作,同时保持计算效率。在ASL字母数据集上进行测试时,我们的模型实现了99.97%的高精度,每秒运行110帧,仅需要5.0 GFLOPs,远低于传统的视觉Transformer模型,后者需要两倍以上的计算能力。这一成功的关键在于我们使用逐元素乘法的特征融合策略,该策略有助于模型专注于重要的手势细节,同时抑制背景噪声。此外,我们采用了先进的数据增强技术以及一种结合对比学习和域适应的训练方法来提高鲁棒性。总体而言,这项工作为手势识别提供了一个实用且强大的解决方案,在准确性、速度和效率之间取得了最佳平衡,这是迈向实际应用的重要一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/a4a7b53412aa/41598_2025_6344_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/867d79062d4c/41598_2025_6344_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/a3f822d045bd/41598_2025_6344_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/14ab31ce74ad/41598_2025_6344_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/fbe3aaa6d955/41598_2025_6344_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/67bac182a90e/41598_2025_6344_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/033563345d22/41598_2025_6344_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/d1f4d2a33b81/41598_2025_6344_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/26c754c0c397/41598_2025_6344_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/2704323c06b5/41598_2025_6344_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/2c84a9da1aaa/41598_2025_6344_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/1b553580275d/41598_2025_6344_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/0b906a349dd0/41598_2025_6344_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/42948e5c8b27/41598_2025_6344_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/4fb8d2d3ab80/41598_2025_6344_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/e0c2b2c53908/41598_2025_6344_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/d1ff3f73d894/41598_2025_6344_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/a4a7b53412aa/41598_2025_6344_Fig17_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/867d79062d4c/41598_2025_6344_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/a3f822d045bd/41598_2025_6344_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/14ab31ce74ad/41598_2025_6344_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/fbe3aaa6d955/41598_2025_6344_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/67bac182a90e/41598_2025_6344_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/033563345d22/41598_2025_6344_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/d1f4d2a33b81/41598_2025_6344_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/26c754c0c397/41598_2025_6344_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/2704323c06b5/41598_2025_6344_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/2c84a9da1aaa/41598_2025_6344_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/1b553580275d/41598_2025_6344_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/0b906a349dd0/41598_2025_6344_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/42948e5c8b27/41598_2025_6344_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/4fb8d2d3ab80/41598_2025_6344_Fig14_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/e0c2b2c53908/41598_2025_6344_Fig15_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/d1ff3f73d894/41598_2025_6344_Fig16_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3674/12185765/a4a7b53412aa/41598_2025_6344_Fig17_HTML.jpg

相似文献

1
Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。
Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.
2
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
3
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
4
Skin-CAD: Explainable deep learning classification of skin cancer from dermoscopic images by feature selection of dual high-level CNNs features and transfer learning.皮肤 CAD:基于双高级 CNN 特征选择和迁移学习的皮肤镜图像皮肤癌可解释深度学习分类。
Comput Biol Med. 2024 Aug;178:108798. doi: 10.1016/j.compbiomed.2024.108798. Epub 2024 Jun 25.
5
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
6
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.
7
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
8
Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤:系统评价与经济学评估
Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.
9
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤
Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.
10
Community views on mass drug administration for soil-transmitted helminths: a qualitative evidence synthesis.社区对土壤传播蠕虫群体药物给药的看法:定性证据综合分析
Cochrane Database Syst Rev. 2025 Jun 20;6:CD015794. doi: 10.1002/14651858.CD015794.pub2.

本文引用的文献

1
Survey on Context-Aware Radio Frequency-Based Sensing.基于上下文感知的射频传感研究
Sensors (Basel). 2025 Jan 21;25(3):602. doi: 10.3390/s25030602.
2
Systematic Review of Hybrid Vision Transformer Architectures for Radiological Image Analysis.用于放射图像分析的混合视觉Transformer架构的系统综述
J Imaging Inform Med. 2025 Jan 27. doi: 10.1007/s10278-024-01322-4.
3
Weakly-supervised thyroid ultrasound segmentation: Leveraging multi-scale consistency, contextual features, and bounding box supervision for accurate target delineation.
弱监督甲状腺超声分割:利用多尺度一致性、上下文特征和边界框监督进行精确目标描绘。
Comput Biol Med. 2025 Mar;186:109669. doi: 10.1016/j.compbiomed.2025.109669. Epub 2025 Jan 13.
4
Efficient image classification through collaborative knowledge distillation: A novel AlexNet modification approach.通过协作式知识蒸馏实现高效图像分类:一种新颖的AlexNet改进方法。
Heliyon. 2024 Jul 14;10(14):e34376. doi: 10.1016/j.heliyon.2024.e34376. eCollection 2024 Jul 30.
5
Sign language recognition based on dual-path background erasure convolutional neural network.基于双路径背景消除卷积神经网络的手语识别
Sci Rep. 2024 May 18;14(1):11360. doi: 10.1038/s41598-024-62008-z.
6
The sound of safety: exploring the determinants of prevention intention in noisy industrial workplaces.安全的声音:探索嘈杂工业工作场所预防意图的决定因素。
BMC Public Health. 2024 Jan 4;24(1):90. doi: 10.1186/s12889-023-17618-z.
7
Deep Learning Technology to Recognize American Sign Language Alphabet.深度学习技术识别美国手语字母。
Sensors (Basel). 2023 Sep 19;23(18):7970. doi: 10.3390/s23187970.
8
Molecular Property Prediction of Modified Gedunin Using Machine Learning.基于机器学习的改性吉杜宁分子性质预测。
Molecules. 2023 Jan 23;28(3):1125. doi: 10.3390/molecules28031125.
9
Two-Stream Mixed Convolutional Neural Network for American Sign Language Recognition.双流混合卷积神经网络在美手势语识别中的应用。
Sensors (Basel). 2022 Aug 9;22(16):5959. doi: 10.3390/s22165959.
10
A novel deep learning model to detect COVID-19 based on wavelet features extracted from Mel-scale spectrogram of patients' cough and breathing sounds.一种基于从患者咳嗽和呼吸声的梅尔频谱图中提取的小波特征来检测新冠肺炎的新型深度学习模型。
Inform Med Unlocked. 2022;32:101049. doi: 10.1016/j.imu.2022.101049. Epub 2022 Aug 13.