• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过融合传感器的多模块交互框架增强人机协作:物体识别、语音交流、目标用户检测、手势和眼神识别。

Enhancing Human-Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition.

机构信息

Department of Computer Science and Engineering, University of Nevada, Reno, 1664 N Virginia St, Reno, NV 89557, USA.

出版信息

Sensors (Basel). 2023 Jun 21;23(13):5798. doi: 10.3390/s23135798.

DOI:10.3390/s23135798
PMID:37447647
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10347030/
Abstract

With the increasing presence of robots in our daily lives, it is crucial to design interaction interfaces that are natural, easy to use and meaningful for robotic tasks. This is important not only to enhance the user experience but also to increase the task reliability by providing supplementary information. Motivated by this, we propose a multi-modal framework consisting of multiple independent modules. These modules take advantage of multiple sensors (e.g., image, sound, depth) and can be used separately or in combination for effective human-robot collaborative interaction. We identified and implemented four key components of an effective human robot collaborative setting, which included determining object location and pose, extracting intricate information from verbal instructions, resolving user(s) of interest (UOI), and gesture recognition and gaze estimation to facilitate the natural and intuitive interactions. The system uses a feature-detector-descriptor approach for object recognition and a homography-based technique for planar pose estimation and a deep multi-task learning model to extract intricate task parameters from verbal communication. The user of interest (UOI) is detected by estimating the facing state and active speakers. The framework also includes gesture detection and gaze estimation modules, which are combined with a verbal instruction component to form structured commands for robotic entities. Experiments were conducted to assess the performance of these interaction interfaces, and the results demonstrated the effectiveness of the approach.

摘要

随着机器人在我们日常生活中的日益普及,设计自然、易用且对机器人任务有意义的交互界面至关重要。这不仅对于提升用户体验很重要,而且对于通过提供补充信息提高任务可靠性也很重要。有鉴于此,我们提出了一个由多个独立模块组成的多模态框架。这些模块利用多个传感器(例如,图像、声音、深度),可以单独或组合使用,以实现有效的人机协作交互。我们确定并实现了有效的人机协作环境的四个关键组件,包括确定物体的位置和姿态、从口头指令中提取复杂信息、确定感兴趣的用户 (UOI) 以及手势识别和注视估计,以促进自然直观的交互。该系统使用特征检测器-描述符方法进行物体识别,使用基于单应性的技术进行平面姿态估计,并使用深度多任务学习模型从口头交流中提取复杂的任务参数。感兴趣的用户 (UOI) 通过估计面对状态和活动说话者来检测。该框架还包括手势检测和注视估计模块,它们与口头指令组件相结合,为机器人实体形成结构化命令。进行了实验来评估这些交互接口的性能,结果表明了该方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/b478fa60ec35/sensors-23-05798-g027.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/515ee92fda8c/sensors-23-05798-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/0a20f9ed4a5f/sensors-23-05798-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c2fa1cd09fa9/sensors-23-05798-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7338a25ea5b4/sensors-23-05798-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/738e694488ef/sensors-23-05798-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/1c565334b0ac/sensors-23-05798-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/10b8f779f044/sensors-23-05798-g007a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c93477e43fb6/sensors-23-05798-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/51ff91992d71/sensors-23-05798-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/e46d40790ee4/sensors-23-05798-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/74e87c28ff47/sensors-23-05798-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/33e9429a95c7/sensors-23-05798-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/d33e94eaf494/sensors-23-05798-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7bf26f65a7f1/sensors-23-05798-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/9d804cfc23db/sensors-23-05798-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/412797c1982d/sensors-23-05798-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/48ff790cbb4c/sensors-23-05798-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c87f4b406c62/sensors-23-05798-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/eaacf03fd9ae/sensors-23-05798-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/f2b8be5d7a7d/sensors-23-05798-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/f30b4999c4cf/sensors-23-05798-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/3757a8275ba5/sensors-23-05798-g022.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7171a788af4d/sensors-23-05798-g023.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/0914699e0e7d/sensors-23-05798-g024.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/3eb205483079/sensors-23-05798-g025.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/61ee366c67aa/sensors-23-05798-g026a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/b478fa60ec35/sensors-23-05798-g027.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/515ee92fda8c/sensors-23-05798-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/0a20f9ed4a5f/sensors-23-05798-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c2fa1cd09fa9/sensors-23-05798-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7338a25ea5b4/sensors-23-05798-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/738e694488ef/sensors-23-05798-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/1c565334b0ac/sensors-23-05798-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/10b8f779f044/sensors-23-05798-g007a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c93477e43fb6/sensors-23-05798-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/51ff91992d71/sensors-23-05798-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/e46d40790ee4/sensors-23-05798-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/74e87c28ff47/sensors-23-05798-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/33e9429a95c7/sensors-23-05798-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/d33e94eaf494/sensors-23-05798-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7bf26f65a7f1/sensors-23-05798-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/9d804cfc23db/sensors-23-05798-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/412797c1982d/sensors-23-05798-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/48ff790cbb4c/sensors-23-05798-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/c87f4b406c62/sensors-23-05798-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/eaacf03fd9ae/sensors-23-05798-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/f2b8be5d7a7d/sensors-23-05798-g020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/f30b4999c4cf/sensors-23-05798-g021.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/3757a8275ba5/sensors-23-05798-g022.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/7171a788af4d/sensors-23-05798-g023.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/0914699e0e7d/sensors-23-05798-g024.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/3eb205483079/sensors-23-05798-g025.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/61ee366c67aa/sensors-23-05798-g026a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c62f/10347030/b478fa60ec35/sensors-23-05798-g027.jpg

相似文献

1
Enhancing Human-Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze Recognition.通过融合传感器的多模块交互框架增强人机协作:物体识别、语音交流、目标用户检测、手势和眼神识别。
Sensors (Basel). 2023 Jun 21;23(13):5798. doi: 10.3390/s23135798.
2
Integration of Tracking, Re-Identification, and Gesture Recognition for Facilitating Human-Robot Interaction.用于促进人机交互的跟踪、重新识别和手势识别的集成。
Sensors (Basel). 2024 Jul 25;24(15):4850. doi: 10.3390/s24154850.
3
Improving gesture-based interaction between an assistive bathing robot and older adults via user training on the gestural commands.通过对老年人进行手势命令的用户培训,改善辅助沐浴机器人与老年人之间基于手势的交互。
Arch Gerontol Geriatr. 2020 Mar-Apr;87:103996. doi: 10.1016/j.archger.2019.103996. Epub 2019 Dec 13.
4
Robot Learning of Assistive Manipulation Tasks by Demonstration via Head Gesture-based Interface.通过基于头部手势的界面进行演示实现机器人对辅助操作任务的学习。
IEEE Int Conf Rehabil Robot. 2019 Jun;2019:1139-1146. doi: 10.1109/ICORR.2019.8779379.
5
Egocentric Gesture Recognition Using 3D Convolutional Neural Networks for the Spatiotemporal Adaptation of Collaborative Robots.使用3D卷积神经网络进行协作机器人时空自适应的自我中心手势识别
Front Neurorobot. 2021 Nov 23;15:703545. doi: 10.3389/fnbot.2021.703545. eCollection 2021.
6
TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition.TMMF:用于单阶段连续手势识别的时频多模态融合。
IEEE Trans Image Process. 2021;30:7689-7701. doi: 10.1109/TIP.2021.3108349. Epub 2021 Sep 10.
7
Hand Gesture Interface for Robot Path Definition in Collaborative Applications: Implementation and Comparative Study.协作应用中机器人路径定义的手势界面:实现与比较研究。
Sensors (Basel). 2023 Apr 23;23(9):4219. doi: 10.3390/s23094219.
8
Intention Understanding in Human-Robot Interaction Based on Visual-NLP Semantics.基于视觉自然语言处理语义的人机交互意图理解
Front Neurorobot. 2021 Feb 2;14:610139. doi: 10.3389/fnbot.2020.610139. eCollection 2020.
9
Motion Estimation and Hand Gesture Recognition-Based Human-UAV Interaction Approach in Real Time.基于运动估计和手势识别的实时人机无人机交互方法。
Sensors (Basel). 2022 Mar 25;22(7):2513. doi: 10.3390/s22072513.
10
Robot System Assistant (RoSA): Towards Intuitive Multi-Modal and Multi-Device Human-Robot Interaction.机器人系统助理(RoSA):实现直观的多模态和多设备人机交互。
Sensors (Basel). 2022 Jan 25;22(3):923. doi: 10.3390/s22030923.

本文引用的文献

1
Gaze Estimation by Exploring Two-Eye Asymmetry.通过探索双眼不对称性进行注视估计
IEEE Trans Image Process. 2020 Mar 30. doi: 10.1109/TIP.2020.2982828.
2
A Differential Approach for Gaze Estimation.注视估计的差分方法。
IEEE Trans Pattern Anal Mach Intell. 2021 Mar;43(3):1092-1099. doi: 10.1109/TPAMI.2019.2957373. Epub 2021 Feb 4.
3
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.SegNet:一种用于图像分割的深度卷积编解码器架构。
IEEE Trans Pattern Anal Mach Intell. 2017 Dec;39(12):2481-2495. doi: 10.1109/TPAMI.2016.2644615. Epub 2017 Jan 2.
4
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.空间金字塔池化在深度卷积网络中的视觉识别。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.
5
Adaptive Linear Regression for Appearance-Based Gaze Estimation.基于外观的视线估计的自适应线性回归。
IEEE Trans Pattern Anal Mach Intell. 2014 Oct;36(10):2033-46. doi: 10.1109/TPAMI.2014.2313123.
6
Speaking and Listening with the Eyes: Gaze Signaling during Dyadic Interactions.用眼睛交谈与倾听:二元互动中的注视信号
PLoS One. 2015 Aug 26;10(8):e0136905. doi: 10.1371/journal.pone.0136905. eCollection 2015.
7
BRIEF: Computing a Local Binary Descriptor Very Fast.简介:快速计算局部二值描述符。
IEEE Trans Pattern Anal Mach Intell. 2012 Jul;34(7):1281-98. doi: 10.1109/TPAMI.2011.222. Epub 2011 Nov 15.
8
The neuropsychology of 3-D space.三维空间的神经心理学
Psychol Bull. 1998 Sep;124(2):123-64. doi: 10.1037/0033-2909.124.2.123.
9
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.