Suppr超能文献

通过融合传感器的多模块交互框架增强人机协作:物体识别、语音交流、目标用户检测、手势和眼神识别。

Enhancing Human-Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition.

机构信息

Department of Computer Science and Engineering, University of Nevada, Reno, 1664 N Virginia St, Reno, NV 89557, USA.

出版信息

Sensors (Basel). 2023 Jun 21;23(13):5798. doi: 10.3390/s23135798.

Abstract

With the increasing presence of robots in our daily lives, it is crucial to design interaction interfaces that are natural, easy to use and meaningful for robotic tasks. This is important not only to enhance the user experience but also to increase the task reliability by providing supplementary information. Motivated by this, we propose a multi-modal framework consisting of multiple independent modules. These modules take advantage of multiple sensors (e.g., image, sound, depth) and can be used separately or in combination for effective human-robot collaborative interaction. We identified and implemented four key components of an effective human robot collaborative setting, which included determining object location and pose, extracting intricate information from verbal instructions, resolving user(s) of interest (UOI), and gesture recognition and gaze estimation to facilitate the natural and intuitive interactions. The system uses a feature-detector-descriptor approach for object recognition and a homography-based technique for planar pose estimation and a deep multi-task learning model to extract intricate task parameters from verbal communication. The user of interest (UOI) is detected by estimating the facing state and active speakers. The framework also includes gesture detection and gaze estimation modules, which are combined with a verbal instruction component to form structured commands for robotic entities. Experiments were conducted to assess the performance of these interaction interfaces, and the results demonstrated the effectiveness of the approach.

摘要

随着机器人在我们日常生活中的日益普及,设计自然、易用且对机器人任务有意义的交互界面至关重要。这不仅对于提升用户体验很重要,而且对于通过提供补充信息提高任务可靠性也很重要。有鉴于此,我们提出了一个由多个独立模块组成的多模态框架。这些模块利用多个传感器(例如,图像、声音、深度),可以单独或组合使用,以实现有效的人机协作交互。我们确定并实现了有效的人机协作环境的四个关键组件,包括确定物体的位置和姿态、从口头指令中提取复杂信息、确定感兴趣的用户 (UOI) 以及手势识别和注视估计,以促进自然直观的交互。该系统使用特征检测器-描述符方法进行物体识别,使用基于单应性的技术进行平面姿态估计,并使用深度多任务学习模型从口头交流中提取复杂的任务参数。感兴趣的用户 (UOI) 通过估计面对状态和活动说话者来检测。该框架还包括手势检测和注视估计模块,它们与口头指令组件相结合,为机器人实体形成结构化命令。进行了实验来评估这些交互接口的性能,结果表明了该方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/515ee92fda8c/sensors-23-05798-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验