Suppr超能文献

SSTA-ResT:用于阿根廷手语识别的软时空注意力残差网络变压器

SSTA-ResT: Soft Spatiotemporal Attention ResNet Transformer for Argentine Sign Language Recognition.

作者信息

Liu Xianru, Zhou Zeru, Xia E, Yin Xin

机构信息

School of Automation, Central South University, Changsha 410083, China.

Information and Network Center, Central South University, Changsha 410083, China.

出版信息

Sensors (Basel). 2025 Sep 5;25(17):5543. doi: 10.3390/s25175543.

Abstract

Sign language recognition technology serves as a crucial bridge, fostering meaningful connections between deaf individuals and hearing individuals. This technological innovation plays a substantial role in promoting social inclusivity. Conventional sign language recognition methodologies that rely on static images are inadequate for capturing the dynamic characteristics and temporal information inherent in sign language. This limitation restricts their practical applicability in real-world scenarios. The proposed framework, called SSTA-ResT, integrates ResNet, soft spatiotemporal attention, and Transformer encoders to achieve this objective. The framework utilizes ResNet to extract robust spatial feature representations, employs the lightweight SSTA module for dual-path complementary representation enhancement to strengthen spatiotemporal associations, and leverages the Transformer encoder to capture long-range temporal dependencies. Experimental results on the LSA64 Argentine Sign Language (ASL) dataset demonstrate that the proposed method achieves an accuracy of 96.25%, a precision of 97.18%, and an F1 score of 0.9671. These results surpass the performance of existing methods across all metrics while maintaining a relatively low model parameter count of 11.66 M. This demonstrates the framework's effectiveness and practicality for sign language video recognition tasks.

摘要

手语识别技术是一座至关重要的桥梁,促进了聋人与听力正常者之间有意义的交流。这项技术创新在促进社会包容性方面发挥了重要作用。传统的依赖静态图像的手语识别方法不足以捕捉手语中固有的动态特征和时间信息。这一局限性限制了它们在现实场景中的实际适用性。所提出的名为SSTA-ResT的框架集成了ResNet、软时空注意力和Transformer编码器来实现这一目标。该框架利用ResNet提取强大的空间特征表示,采用轻量级SSTA模块进行双路径互补表示增强以加强时空关联,并利用Transformer编码器捕捉长距离时间依赖性。在LSA64阿根廷手语(ASL)数据集上的实验结果表明,所提出的方法实现了96.25%的准确率、97.18%的精确率和0.9671的F1分数。这些结果在所有指标上都超过了现有方法的性能,同时保持了相对较低的1166万个模型参数数量。这证明了该框架在手语视频识别任务中的有效性和实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b533/12431434/379adcb039d5/sensors-25-05543-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验