Yamada Tatsuro, Murata Shingo, Arie Hiroaki, Ogata Tetsuya
Department of Intermedia Art and Science, Waseda University, Tokyo, Japan.
Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan.
Front Neurorobot. 2017 Dec 22;11:70. doi: 10.3389/fnbot.2017.00070. eCollection 2017.
An important characteristic of human language is compositionality. We can efficiently express a wide variety of real-world situations, events, and behaviors by compositionally constructing the meaning of a complex expression from a finite number of elements. Previous studies have analyzed how machine-learning models, particularly neural networks, can learn from experience to represent compositional relationships between language and robot actions with the aim of understanding the symbol grounding structure and achieving intelligent communicative agents. Such studies have mainly dealt with the words (nouns, adjectives, and verbs) that directly refer to real-world matters. In addition to these words, the current study deals with logic words, such as "not," "and," and "or" simultaneously. These words are not directly referring to the real world, but are logical operators that contribute to the construction of meaning in sentences. In human-robot communication, these words may be used often. The current study builds a recurrent neural network model with long short-term memory units and trains it to learn to translate sentences including logic words into robot actions. We investigate what kind of compositional representations, which mediate sentences and robot actions, emerge as the network's internal states via the learning process. Analysis after learning shows that referential words are merged with visual information and the robot's own current state, and the logical words are represented by the model in accordance with their functions as logical operators. Words such as "true," "false," and "not" work as non-linear transformations to encode orthogonal phrases into the same area in a memory cell state space. The word "and," which required a robot to lift up both its hands, worked as if it was a universal quantifier. The word "or," which required action generation that looked apparently random, was represented as an unstable space of the network's dynamical system.
人类语言的一个重要特征是组合性。我们可以通过从有限数量的元素中组合构建复杂表达式的含义,来有效地表达各种各样的现实世界情况、事件和行为。先前的研究分析了机器学习模型,特别是神经网络,如何从经验中学习以表示语言与机器人动作之间的组合关系,目的是理解符号基础结构并实现智能通信代理。此类研究主要处理直接指代现实世界事物的单词(名词、形容词和动词)。除了这些单词之外,本研究还同时处理逻辑词,如“not”“and”和“or”。这些词并不直接指代现实世界,而是有助于句子意义构建的逻辑运算符。在人机通信中,这些词可能会经常被使用。本研究构建了一个带有长短期记忆单元的循环神经网络模型,并对其进行训练,使其学会将包含逻辑词的句子翻译成机器人动作。我们研究在学习过程中,作为网络内部状态出现的、介导句子和机器人动作的是何种组合表示。学习后的分析表明,指代性单词与视觉信息以及机器人自身的当前状态相融合,并且逻辑词由模型根据其作为逻辑运算符的功能来表示。诸如“true”“false”和“not”之类的词起到非线性变换的作用,将正交短语编码到记忆单元状态空间中的同一区域。要求机器人举起双手的“and”这个词,其作用就好像是一个全称量词。要求产生看似随机动作的“or”这个词,被表示为网络动态系统的一个不稳定空间。