工业导向的人机语音通信与基于视觉的物体识别的集成。

Integration of Industrially-Oriented Human-Robot Speech Communication and Vision-Based Object Recognition.

机构信息

Department of Automation and Metal Cutting, Warsaw University of Technology, 02-524 Warsaw, Poland.

出版信息

Sensors (Basel). 2020 Dec 18;20(24):7287. doi: 10.3390/s20247287.

DOI:10.3390/s20247287

PMID:33353038

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7767307/

Abstract

This paper presents a novel method for integration of industrially-oriented human-robot speech communication and vision-based object recognition. Such integration is necessary to provide context for task-oriented voice commands. Context-based speech communication is easier, the commands are shorter, hence their recognition rate is higher. In recent years, significant research was devoted to integration of speech and gesture recognition. However, little attention was paid to vision-based identification of objects in industrial environment (like workpieces or tools) represented by general terms used in voice commands. There are no reports on any methods facilitating the abovementioned integration. Image and speech recognition systems usually operate on different data structures, describing reality on different levels of abstraction, hence development of context-based voice control systems is a laborious and time-consuming task. The aim of our research was to solve this problem. The core of our method is extension of Voice Command Description (VCD) format describing syntax and semantics of task-oriented commands, as well as its integration with Flexible Editable Contour Templates (FECT) used for classification of contours derived from image recognition systems. To the best of our knowledge, it is the first solution that facilitates development of customized vision-based voice control applications for industrial robots.

摘要

本文提出了一种新颖的方法，用于集成面向工业的人机语音通信和基于视觉的物体识别。这种集成对于提供面向任务的语音命令的上下文是必要的。基于上下文的语音通信更容易，命令更短，因此它们的识别率更高。近年来，人们致力于将语音和手势识别集成在一起。然而，对于工业环境中（如工件或工具）通用语音命令中表示的物体的基于视觉的识别，关注较少。没有关于任何促进上述集成的方法的报告。图像和语音识别系统通常在不同的数据结构上运行，描述现实的抽象层次不同，因此开发基于上下文的语音控制系统是一项费力且耗时的任务。我们研究的目的是解决这个问题。我们的方法的核心是扩展描述面向任务命令的语法和语义的语音命令描述 (VCD) 格式，以及将其与用于分类来自图像识别系统的轮廓的灵活可编辑轮廓模板 (FECT) 集成。据我们所知，这是第一个促进为工业机器人开发定制的基于视觉的语音控制应用程序的解决方案。