Quan Bui Hong, Anh Nguyen Dinh Tuan, Phi Hoang Van, Thanh Bui Trung
Faculty of Information Technology, VNU-University of Engineering and Technology (VNU-UET), Hanoi 10000, Vietnam.
Faculty of Mechanical Engineering, Hung Yen University of Technology and Education, Hungyen 16000, Vietnam.
Sensors (Basel). 2025 Sep 2;25(17):5411. doi: 10.3390/s25175411.
Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech commands, wake-word detection, and vocal gestures. However, existing systems often suffer from limitations in responsiveness and accuracy, especially under real-world conditions. In this paper, we present 3-Modal Human-Computer Interaction (3M-HCI), a novel interaction system that dynamically integrates facial, vocal, and eye-based inputs through a new signal processing pipeline and a cross-modal coordination mechanism. This approach not only enhances recognition accuracy but also reduces interaction latency. Experimental results demonstrate that 3M-HCI outperforms several recent hands-free interaction solutions in both speed and precision, highlighting its potential as a robust assistive interface.
免提计算机交互是辅助技术中的一个关键主题,基于摄像头和基于语音的系统是最常见的方法。最近基于摄像头的解决方案利用面部表情或头部动作来模拟鼠标点击或按键,而基于语音的系统则通过语音命令、唤醒词检测和语音手势实现控制。然而,现有系统在响应性和准确性方面往往存在局限性,尤其是在现实世界条件下。在本文中,我们提出了三模态人机交互(3M-HCI),这是一种新颖的交互系统,它通过新的信号处理管道和跨模态协调机制动态集成面部、语音和基于眼睛的输入。这种方法不仅提高了识别准确率,还减少了交互延迟。实验结果表明,3M-HCI在速度和精度方面均优于最近的几种免提交互解决方案,凸显了其作为强大辅助界面的潜力。