Tada Yuuki, Hagiwara Yoshinobu, Tanaka Hiroki, Taniguchi Tadahiro
Emergent Systems Laboratory, College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.
Front Robot AI. 2020 Jan 14;6:144. doi: 10.3389/frobt.2019.00144. eCollection 2019.
This paper describes a new method that enables a service robot to understand spoken commands in a robust manner using off-the-shelf automatic speech recognition (ASR) systems and an encoder-decoder neural network with noise injection. In numerous instances, the understanding of spoken commands in the area of service robotics is modeled as a mapping of speech signals to a sequence of commands that can be understood and performed by a robot. In a conventional approach, speech signals are recognized, and semantic parsing is applied to infer the command sequence from the utterance. However, if errors occur during the process of speech recognition, a conventional semantic parsing method cannot be appropriately applied because most natural language processing methods do not recognize such errors. We propose the use of encoder-decoder neural networks, e.g., sequence to sequence, with noise injection. The noise is injected into phoneme sequences during the training phase of encoder-decoder neural network-based semantic parsing systems. We demonstrate that the use of neural networks with a noise injection can mitigate the negative effects of speech recognition errors in understanding robot-directed speech commands i.e., increase the performance of semantic parsing. We implemented the method and evaluated it using the commands given during a general purpose service robot (GPSR) task, such as a task applied in RoboCup@Home, which is a standard service robot competition for the testing of service robots. The results of the experiment show that the proposed method, namely, sequence to sequence with noise injection (Seq2Seq-NI), outperforms the baseline methods. In addition, Seq2Seq-NI enables a robot to understand a spoken command even when the speech recognition by an off-the-shelf ASR system contains recognition errors. Moreover, in this paper we describe an experiment conducted to evaluate the influence of the injected noise and provide a discussion of the results.
本文描述了一种新方法,该方法能使服务机器人使用现成的自动语音识别(ASR)系统以及带有噪声注入的编码器-解码器神经网络,以稳健的方式理解语音命令。在许多情况下,服务机器人领域中对语音命令的理解被建模为语音信号到机器人能够理解并执行的命令序列的映射。在传统方法中,先识别语音信号,然后应用语义解析从话语中推断命令序列。然而,如果在语音识别过程中出现错误,传统的语义解析方法就无法适当地应用,因为大多数自然语言处理方法无法识别此类错误。我们建议使用带有噪声注入的编码器-解码器神经网络,例如序列到序列模型。在基于编码器-解码器神经网络的语义解析系统的训练阶段,将噪声注入音素序列。我们证明,使用带有噪声注入的神经网络可以减轻语音识别错误对理解机器人指令语音命令的负面影响,即提高语义解析的性能。我们实现了该方法,并使用通用服务机器人(GPSR)任务中给出的命令对其进行评估,例如在RoboCup@Home中应用的任务,这是一个用于测试服务机器人的标准服务机器人竞赛。实验结果表明,所提出的方法,即带有噪声注入的序列到序列模型(Seq2Seq-NI),优于基线方法。此外,Seq2Seq-NI使机器人即使在现成的ASR系统的语音识别包含错误时也能理解语音命令。而且,在本文中我们描述了一项评估注入噪声影响的实验,并对结果进行了讨论。