Deichler Anna, Wang Siyang, Alexanderson Simon, Beskow Jonas
Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden.
Front Robot AI. 2023 Mar 30;10:1110534. doi: 10.3389/frobt.2023.1110534. eCollection 2023.
One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.
机器人技术和智能体研究的主要目标之一是使它们能够在实际环境中与人类进行交流。人类交流包括言语和非言语两种方式。最近关于使智能体实现交流的研究主要集中在言语方式,即语言和语音。然而,在实际环境中,非言语方式对于智能体采用灵活的交流策略至关重要。在这项工作中,我们专注于学习在实际的具身交互智能体中生成非言语交际表达。具体而言,我们展示了一个智能体可以通过模仿学习和强化学习相结合的方式,在物理模拟环境中学习指向手势,从而实现高动作自然度和高指称准确性。我们在主观和客观评估中,将我们提出的系统与多个基线进行了比较。主观评估是在虚拟现实环境中进行的,在一个共享的3D空间中,用户和智能体之间进行具身指称游戏,这种设置能够全面评估所生成手势的交际能力。评估结果表明,与一个先进的监督学习动作合成模型相比,我们的模型实现了更高水平的指称准确性和动作自然度,这表明我们提出的将模仿学习和强化学习相结合以生成交际手势的系统具有潜力。此外,我们的系统在物理模拟环境中具有鲁棒性,因此有应用于机器人的潜力。