• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于实时增强现实手势交互的具有掩码自动编码的局部模式感知3D视频斯温变压器

Local pattern aware 3D video swin transformer with masked autoencoding for realtime augmented reality gesture interaction.

作者信息

Wang Suli

机构信息

Faculty of Data Science, City University of Macau, Taipa, 999078, Macau, China.

School of Computer Engineering, Guangzhou City University of Technology, Guangzhou, 510800, China.

出版信息

Sci Rep. 2025 Jul 1;15(1):21318. doi: 10.1038/s41598-025-05935-9.

DOI:10.1038/s41598-025-05935-9
PMID:40594635
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12218122/
Abstract

This study proposes a real-time augmented reality gesture interaction algorithm based on the Swin Transformer and a masked self-encoder. This algorithm solves the challenges of the traditional Transformer model regarding spatio-temporal feature extraction and real-time performance. During data preprocessing, the study uses a synthetic data annotation method to automatically generate 3D gesture images and annotate joint information, significantly improving data annotation efficiency. Using weighted Euclidean distance and structural similarity optimization, the paper proposes an image denoising model based on maximum a posteriori probability that effectively reduces noise interference in gesture image analysis. The gesture detection and segmentation module combines EfficientNet and Transformer models. It fuses shallow and deep features through skip connections, realizes multi-scale feature extraction, and enhances attention to the target area through the triplet attention module. Additionally, the paper introduces the local texture feature prior (RTHLBP) to optimize gesture recognition and segmentation accuracy. In the gesture classification module, the paper proposes a ViT architecture based on a masked autoencoder. It aligns features at different levels through a dynamic weight fusion strategy and combines the relative total variation map as a self-monitoring element. This significantly improves classification performance. Experimental results demonstrate that the proposed model's accuracy, F1 score, and MIoU on the 4 GTEA sub-dataset surpass those of traditional CNN, Transformer, MobileNet, and DenseNet models, particularly on small datasets. The paper also optimizes the model's real-time performance through a multi-core parallel computing strategy. Experiments show that as the number of DSP cores increases, the computation time is significantly reduced and the computational efficiency remains at a high level.

摘要

本研究提出了一种基于Swin Transformer和掩码自编码器的实时增强现实手势交互算法。该算法解决了传统Transformer模型在时空特征提取和实时性能方面的挑战。在数据预处理过程中,该研究采用合成数据标注方法自动生成3D手势图像并标注关节信息,显著提高了数据标注效率。通过加权欧几里得距离和结构相似性优化,本文提出了一种基于最大后验概率的图像去噪模型,有效降低了手势图像分析中的噪声干扰。手势检测与分割模块结合了EfficientNet和Transformer模型。它通过跳跃连接融合浅层和深层特征,实现多尺度特征提取,并通过三元组注意力模块增强对目标区域的关注。此外,本文引入局部纹理特征先验(RTHLBP)来优化手势识别和分割精度。在手势分类模块中,本文提出了一种基于掩码自动编码器的ViT架构。它通过动态权重融合策略对齐不同层次的特征,并结合相对全变差图作为自监督元素。这显著提高了分类性能。实验结果表明,所提出模型在4个GTEA子数据集上的准确率、F1分数和MIoU超过了传统的CNN、Transformer、MobileNet和DenseNet模型,特别是在小数据集上。本文还通过多核并行计算策略优化了模型的实时性能。实验表明,随着DSP核数量的增加,计算时间显著减少,计算效率保持在较高水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/481ff7f8be81/41598_2025_5935_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/07b2afc27241/41598_2025_5935_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/358dd9945f6a/41598_2025_5935_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/126af355fe28/41598_2025_5935_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/24093f871f69/41598_2025_5935_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/481ff7f8be81/41598_2025_5935_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/07b2afc27241/41598_2025_5935_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/358dd9945f6a/41598_2025_5935_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/126af355fe28/41598_2025_5935_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/24093f871f69/41598_2025_5935_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c8a/12218122/481ff7f8be81/41598_2025_5935_Fig7_HTML.jpg

相似文献

1
Local pattern aware 3D video swin transformer with masked autoencoding for realtime augmented reality gesture interaction.用于实时增强现实手势交互的具有掩码自动编码的局部模式感知3D视频斯温变压器
Sci Rep. 2025 Jul 1;15(1):21318. doi: 10.1038/s41598-025-05935-9.
2
A fake news detection model using the integration of multimodal attention mechanism and residual convolutional network.一种融合多模态注意力机制和残差卷积网络的假新闻检测模型。
Sci Rep. 2025 Jul 1;15(1):20544. doi: 10.1038/s41598-025-05702-w.
3
TLTNet: A novel transscale cascade layered transformer network for enhanced retinal blood vessel segmentation.TLTNet:一种新颖的跨尺度级联分层Transformer 网络,用于增强视网膜血管分割。
Comput Biol Med. 2024 Aug;178:108773. doi: 10.1016/j.compbiomed.2024.108773. Epub 2024 Jun 25.
4
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
5
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
6
Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。
Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.
7
Gesture recognition for hearing impaired people using an ensemble of deep learning models with improving beluga whale optimization-based hyperparameter tuning.基于改进的白鲸优化超参数调优的深度学习模型集成用于听力障碍者的手势识别
Sci Rep. 2025 Jul 1;15(1):21441. doi: 10.1038/s41598-025-06680-9.
8
A dual-branch deep learning model based on fNIRS for assessing 3D visual fatigue.一种基于功能近红外光谱技术的双分支深度学习模型,用于评估三维视觉疲劳。
Front Neurosci. 2025 Jun 5;19:1589152. doi: 10.3389/fnins.2025.1589152. eCollection 2025.
9
Unsupervised retinal image registration based on D-STUNet and progressive keypoint screening strategy.基于D-STUNet和渐进式关键点筛选策略的无监督视网膜图像配准
Biomed Phys Eng Express. 2025 Jul 9;11(4). doi: 10.1088/2057-1976/ade9c6.
10
DGCFNet: Dual Global Context Fusion Network for remote sensing image semantic segmentation.DGCFNet:用于遥感图像语义分割的双全局上下文融合网络
PeerJ Comput Sci. 2025 Mar 27;11:e2786. doi: 10.7717/peerj-cs.2786. eCollection 2025.

本文引用的文献

1
Dual-3DMAD: Mixed Transformer Based Semantic Segmentation and Triplet Pre-Processing for Early Multi-Class Alzheimer's Diagnosis.双重 3DMAD:基于混合 Transformer 的语义分割和三重预处理的早期多类阿尔茨海默病诊断。
IEEE Trans Neural Syst Rehabil Eng. 2024;32:696-707. doi: 10.1109/TNSRE.2024.3357723. Epub 2024 Feb 8.
2
An Exploration into Human-Computer Interaction: Hand Gesture Recognition Management in a Challenging Environment.人机交互探索:挑战性环境中的手势识别管理
SN Comput Sci. 2023;4(5):441. doi: 10.1007/s42979-023-01751-y. Epub 2023 Jun 12.
3
MEMS Devices-Based Hand Gesture Recognition via Wearable Computing.
基于MEMS器件的可穿戴计算手势识别
Micromachines (Basel). 2023 Apr 27;14(5):947. doi: 10.3390/mi14050947.
4
Contextual Transformer Networks for Visual Recognition.用于视觉识别的上下文Transformer网络
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):1489-1500. doi: 10.1109/TPAMI.2022.3164083. Epub 2023 Jan 6.
5
Dynamic gesture recognition based on 2D convolutional neural network and feature fusion.基于二维卷积神经网络和特征融合的动态手势识别。
Sci Rep. 2022 Mar 14;12(1):4345. doi: 10.1038/s41598-022-08133-z.
6
Dynamic Gesture Recognition Using Surface EMG Signals Based on Multi-Stream Residual Network.基于多流残差网络的表面肌电信号动态手势识别
Front Bioeng Biotechnol. 2021 Oct 22;9:779353. doi: 10.3389/fbioe.2021.779353. eCollection 2021.
7
Dynamic Hand Gesture Recognition in In-Vehicle Environment Based on FMCW Radar and Transformer.基于 FMCW 雷达和转换器的车载环境下动态手势识别
Sensors (Basel). 2021 Sep 24;21(19):6368. doi: 10.3390/s21196368.
8
A Conceptual Model and Taxonomy for Collaborative Augmented Reality.协作式增强现实的概念模型和分类法。
IEEE Trans Vis Comput Graph. 2022 Dec;28(12):5113-5133. doi: 10.1109/TVCG.2021.3101545. Epub 2022 Oct 26.
9
Constrained transformer network for ECG signal processing and arrhythmia classification.受约束的变压器网络在心电图信号处理和心律失常分类中的应用。
BMC Med Inform Decis Mak. 2021 Jun 9;21(1):184. doi: 10.1186/s12911-021-01546-2.