哈维特：智能家具视觉-语言-手势交互机制研究

HAVIT: research on vision-language gesture interaction mechanism for smart furniture.

作者信息

Chen Hong, Mahdzir Hasnul Azwan Azizan, Li Xuekun, Sayuti Nurul Ayn Ahmad

机构信息

School of Art and Design, Jiangxi University of Technology, Nanchang, China.

Faculty of Art & Design, Universiti Teknologi MARA, Shah Alam, Malaysia.

出版信息

Sci Rep. 2025 Jul 28;15(1):27423. doi: 10.1038/s41598-025-10758-9.

DOI:10.1038/s41598-025-10758-9

PMID:40721440

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12304170/

Abstract

With the rapid development of smart furniture, gesture recognition has gained increasing attention as a natural and intuitive interaction method. However, in practical applications, issues such as limited data resources and insufficient semantic understanding have significantly constrained the effectiveness of gesture recognition technology. To address these challenges, this study proposes HAVIT, a hybrid deep learning model based on Vision Transformer and ALBEF, aimed at enhancing the performance of gesture recognition systems under data-scarce conditions. The model achieves efficient feature extraction and accurate recognition of gesture characteristics through the organic integration of Vision Transformer's feature extraction capabilities and ALBEF's semantic understanding mechanism. Experimental results demonstrate that on a fully labeled dataset, the HAVIT model achieved a classification accuracy of 91.83% and an AUC value of 0.92; under 20% label deficiency conditions, the model maintained an accuracy of 86.89% and an AUC value of 0.88, exhibiting strong robustness. The research findings provide new solutions for the development of smart furniture interaction technology and hold significant implications for advancing practical applications in this field.

摘要

随着智能家居的快速发展，手势识别作为一种自然直观的交互方式受到了越来越多的关注。然而，在实际应用中，数据资源有限和语义理解不足等问题严重制约了手势识别技术的有效性。为应对这些挑战，本研究提出了HAVIT，一种基于视觉Transformer和ALBEF的混合深度学习模型，旨在提高数据稀缺条件下手势识别系统的性能。该模型通过有机融合视觉Transformer的特征提取能力和ALBEF的语义理解机制，实现了对手势特征的高效提取和准确识别。实验结果表明，在全标注数据集上，HAVIT模型的分类准确率达到91.83%，AUC值为0.92；在20%标签缺失的条件下，该模型的准确率保持在86.89%，AUC值为0.88，表现出很强的鲁棒性。研究结果为智能家居交互技术的发展提供了新的解决方案，对推动该领域的实际应用具有重要意义。