• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

哈维特:智能家具视觉-语言-手势交互机制研究

HAVIT: research on vision-language gesture interaction mechanism for smart furniture.

作者信息

Chen Hong, Mahdzir Hasnul Azwan Azizan, Li Xuekun, Sayuti Nurul Ayn Ahmad

机构信息

School of Art and Design, Jiangxi University of Technology, Nanchang, China.

Faculty of Art & Design, Universiti Teknologi MARA, Shah Alam, Malaysia.

出版信息

Sci Rep. 2025 Jul 28;15(1):27423. doi: 10.1038/s41598-025-10758-9.

DOI:10.1038/s41598-025-10758-9
PMID:40721440
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12304170/
Abstract

With the rapid development of smart furniture, gesture recognition has gained increasing attention as a natural and intuitive interaction method. However, in practical applications, issues such as limited data resources and insufficient semantic understanding have significantly constrained the effectiveness of gesture recognition technology. To address these challenges, this study proposes HAVIT, a hybrid deep learning model based on Vision Transformer and ALBEF, aimed at enhancing the performance of gesture recognition systems under data-scarce conditions. The model achieves efficient feature extraction and accurate recognition of gesture characteristics through the organic integration of Vision Transformer's feature extraction capabilities and ALBEF's semantic understanding mechanism. Experimental results demonstrate that on a fully labeled dataset, the HAVIT model achieved a classification accuracy of 91.83% and an AUC value of 0.92; under 20% label deficiency conditions, the model maintained an accuracy of 86.89% and an AUC value of 0.88, exhibiting strong robustness. The research findings provide new solutions for the development of smart furniture interaction technology and hold significant implications for advancing practical applications in this field.

摘要

随着智能家居的快速发展,手势识别作为一种自然直观的交互方式受到了越来越多的关注。然而,在实际应用中,数据资源有限和语义理解不足等问题严重制约了手势识别技术的有效性。为应对这些挑战,本研究提出了HAVIT,一种基于视觉Transformer和ALBEF的混合深度学习模型,旨在提高数据稀缺条件下手势识别系统的性能。该模型通过有机融合视觉Transformer的特征提取能力和ALBEF的语义理解机制,实现了对手势特征的高效提取和准确识别。实验结果表明,在全标注数据集上,HAVIT模型的分类准确率达到91.83%,AUC值为0.92;在20%标签缺失的条件下,该模型的准确率保持在86.89%,AUC值为0.88,表现出很强的鲁棒性。研究结果为智能家居交互技术的发展提供了新的解决方案,对推动该领域的实际应用具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/5b6786b8a3ca/41598_2025_10758_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/e3db825a53b1/41598_2025_10758_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/08b4f9c780bc/41598_2025_10758_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/7e8b4a96c5ed/41598_2025_10758_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/dbab60c87334/41598_2025_10758_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/3726253d58b6/41598_2025_10758_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/c30da1cd8fd2/41598_2025_10758_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/f272636fb410/41598_2025_10758_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/68d0579982d8/41598_2025_10758_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/874dcfe0e0da/41598_2025_10758_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/21483ea93f1a/41598_2025_10758_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/97ee03738869/41598_2025_10758_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/f855b04b4b67/41598_2025_10758_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/5b6786b8a3ca/41598_2025_10758_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/e3db825a53b1/41598_2025_10758_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/08b4f9c780bc/41598_2025_10758_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/7e8b4a96c5ed/41598_2025_10758_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/dbab60c87334/41598_2025_10758_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/3726253d58b6/41598_2025_10758_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/c30da1cd8fd2/41598_2025_10758_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/f272636fb410/41598_2025_10758_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/68d0579982d8/41598_2025_10758_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/874dcfe0e0da/41598_2025_10758_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/21483ea93f1a/41598_2025_10758_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/97ee03738869/41598_2025_10758_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/f855b04b4b67/41598_2025_10758_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6578/12304170/5b6786b8a3ca/41598_2025_10758_Fig12_HTML.jpg

相似文献

1
HAVIT: research on vision-language gesture interaction mechanism for smart furniture.哈维特:智能家具视觉-语言-手势交互机制研究
Sci Rep. 2025 Jul 28;15(1):27423. doi: 10.1038/s41598-025-10758-9.
2
Gesture recognition and response system for special education using computer vision and human-computer interaction technology.基于计算机视觉和人机交互技术的特殊教育手势识别与响应系统。
Disabil Rehabil Assist Technol. 2025 Jul 8:1-18. doi: 10.1080/17483107.2025.2527226.
3
Short-Term Memory Impairment短期记忆障碍
4
Gesture recognition for hearing impaired people using an ensemble of deep learning models with improving beluga whale optimization-based hyperparameter tuning.基于改进的白鲸优化超参数调优的深度学习模型集成用于听力障碍者的手势识别
Sci Rep. 2025 Jul 1;15(1):21441. doi: 10.1038/s41598-025-06680-9.
5
Recognizing American Sign Language gestures efficiently and accurately using a hybrid transformer model.使用混合变压器模型高效准确地识别美国手语手势。
Sci Rep. 2025 Jun 23;15(1):20253. doi: 10.1038/s41598-025-06344-8.
6
Local pattern aware 3D video swin transformer with masked autoencoding for realtime augmented reality gesture interaction.用于实时增强现实手势交互的具有掩码自动编码的局部模式感知3D视频斯温变压器
Sci Rep. 2025 Jul 1;15(1):21318. doi: 10.1038/s41598-025-05935-9.
7
A hybrid model for detecting motion artifacts in ballistocardiogram signals.一种用于检测心冲击图信号中运动伪影的混合模型。
Biomed Eng Online. 2025 Jul 23;24(1):92. doi: 10.1186/s12938-025-01426-0.
8
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset.在有限数据集上使用专家混合策略提高用于法语临床笔记分类的Transformer性能
IEEE J Transl Eng Health Med. 2025 Jun 4;13:261-274. doi: 10.1109/JTEHM.2025.3576570. eCollection 2025.
9
Enabling by voice: an exploratory study on how interactive smart agents (ISAs) can change the design of environmental control (EC) equipment and service.语音启用:关于交互式智能代理(ISA)如何改变环境控制(EC)设备及服务设计的探索性研究。
Disabil Rehabil Assist Technol. 2025 Jul 23:1-30. doi: 10.1080/17483107.2025.2530195.
10
Computer and mobile technology interventions for self-management in chronic obstructive pulmonary disease.用于慢性阻塞性肺疾病自我管理的计算机和移动技术干预措施。
Cochrane Database Syst Rev. 2017 May 23;5(5):CD011425. doi: 10.1002/14651858.CD011425.pub2.

本文引用的文献

1
Smart Home Automation-Based Hand Gesture Recognition Using Feature Fusion and Recurrent Neural Network.基于智能家居自动化的特征融合和循环神经网络的手势识别。
Sensors (Basel). 2023 Aug 30;23(17):7523. doi: 10.3390/s23177523.
2
HGR-ViT: Hand Gesture Recognition with Vision Transformer.HGR-ViT:基于视觉Transformer 的手势识别
Sensors (Basel). 2023 Jun 14;23(12):5555. doi: 10.3390/s23125555.
3
Using transitional information in sign and gesture perception.利用手语和手势感知中的过渡信息。
Acta Psychol (Amst). 2023 Jun;236:103923. doi: 10.1016/j.actpsy.2023.103923. Epub 2023 Apr 21.
4
An Immersive Human-Robot Interactive Game Framework Based on Deep Learning for Children's Concentration Training.一种基于深度学习的沉浸式人机交互游戏框架,用于儿童注意力训练。
Healthcare (Basel). 2022 Sep 15;10(9):1779. doi: 10.3390/healthcare10091779.
5
Atom Search Optimization with Deep Learning Enabled Arabic Sign Language Recognition for Speaking and Hearing Disability Persons.基于深度学习的原子搜索优化算法用于聋哑人士阿拉伯语手语识别
Healthcare (Basel). 2022 Aug 24;10(9):1606. doi: 10.3390/healthcare10091606.
6
Prelinguistic gesture and developmental abilities: A multi-ethnic comparative study.语言前手势与发展能力:一项多民族比较研究。
Infant Behav Dev. 2022 Aug;68:101748. doi: 10.1016/j.infbeh.2022.101748. Epub 2022 Jul 28.
7
Towards Facial Gesture Recognition in Photographs of Patients with Facial Palsy.面向面瘫患者照片中的面部手势识别
Healthcare (Basel). 2022 Mar 31;10(4):659. doi: 10.3390/healthcare10040659.
8
Dynamic gesture recognition based on 2D convolutional neural network and feature fusion.基于二维卷积神经网络和特征融合的动态手势识别。
Sci Rep. 2022 Mar 14;12(1):4345. doi: 10.1038/s41598-022-08133-z.
9
Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement.多模态艺术姿态识别与人类智能增强交互
Front Psychol. 2021 Nov 8;12:769509. doi: 10.3389/fpsyg.2021.769509. eCollection 2021.