• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

学习在情境化具身对话代理中生成指示性手势。

Learning to generate pointing gestures in situated embodied conversational agents.

作者信息

Deichler Anna, Wang Siyang, Alexanderson Simon, Beskow Jonas

机构信息

Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden.

出版信息

Front Robot AI. 2023 Mar 30;10:1110534. doi: 10.3389/frobt.2023.1110534. eCollection 2023.

DOI:10.3389/frobt.2023.1110534
PMID:37064574
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10097883/
Abstract

One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.

摘要

机器人技术和智能体研究的主要目标之一是使它们能够在实际环境中与人类进行交流。人类交流包括言语和非言语两种方式。最近关于使智能体实现交流的研究主要集中在言语方式,即语言和语音。然而,在实际环境中,非言语方式对于智能体采用灵活的交流策略至关重要。在这项工作中,我们专注于学习在实际的具身交互智能体中生成非言语交际表达。具体而言,我们展示了一个智能体可以通过模仿学习和强化学习相结合的方式,在物理模拟环境中学习指向手势,从而实现高动作自然度和高指称准确性。我们在主观和客观评估中,将我们提出的系统与多个基线进行了比较。主观评估是在虚拟现实环境中进行的,在一个共享的3D空间中,用户和智能体之间进行具身指称游戏,这种设置能够全面评估所生成手势的交际能力。评估结果表明,与一个先进的监督学习动作合成模型相比,我们的模型实现了更高水平的指称准确性和动作自然度,这表明我们提出的将模仿学习和强化学习相结合以生成交际手势的系统具有潜力。此外,我们的系统在物理模拟环境中具有鲁棒性,因此有应用于机器人的潜力。

相似文献

1
Learning to generate pointing gestures in situated embodied conversational agents.学习在情境化具身对话代理中生成指示性手势。
Front Robot AI. 2023 Mar 30;10:1110534. doi: 10.3389/frobt.2023.1110534. eCollection 2023.
2
Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda-Lucas-Tomasi Tracker and Denavit-Hartenberg-Based Kinematic Model.使用优化的 Kaneda-Lucas-Tomasi 跟踪器和基于 Denavit-Hartenberg 的运动学模型捕获具身对话代理的会话手势。
Sensors (Basel). 2022 Oct 29;22(21):8318. doi: 10.3390/s22218318.
3
Evaluation of text-to-gesture generation model using convolutional neural network.基于卷积神经网络的文本到手势生成模型评估。
Neural Netw. 2022 Jul;151:365-375. doi: 10.1016/j.neunet.2022.03.041. Epub 2022 Apr 4.
4
Automating the Production of Communicative Gestures in Embodied Characters.在具身角色中实现交际手势的自动化生成。
Front Psychol. 2018 Jul 9;9:1144. doi: 10.3389/fpsyg.2018.01144. eCollection 2018.
5
Zero-shot style transfer for gesture animation driven by text and speech using adversarial disentanglement of multimodal style encoding.利用多模态风格编码的对抗解缠实现由文本和语音驱动的手势动画的零样本风格迁移。
Front Artif Intell. 2023 Jun 12;6:1142997. doi: 10.3389/frai.2023.1142997. eCollection 2023.
6
Olive baboons communicate intentionally by pointing.橄榄狒狒通过指物进行有目的的交流。
Anim Cogn. 2013 Mar;16(2):155-63. doi: 10.1007/s10071-012-0558-y. Epub 2012 Sep 6.
7
[Individual differences in the linguistic competence of referential and expressive subjects].[指称性和表达性主体语言能力的个体差异]
Psicothema. 2006 Feb;18(1):37-42.
8
Embodied science and mixed reality: How gesture and motion capture affect physics education.具身科学与混合现实:手势和动作捕捉如何影响物理教育。
Cogn Res Princ Implic. 2017;2(1):24. doi: 10.1186/s41235-017-0060-9. Epub 2017 May 24.
9
The Production of Gesture and Speech by People With Aphasia: Influence of Communicative Constraints.言语和手势的产生与失语症患者:交际限制的影响。
J Speech Lang Hear Res. 2019 Nov 11;62(12):4417-4432. doi: 10.1044/2019_JSLHR-L-19-0020. Print 2019 Dec 18.
10
Lexical learning in mild aphasia: gesture benefit depends on patholinguistic profile and lesion pattern.轻度失语症患者的词汇学习:手势获益取决于病理语言学特征和病变模式。
Cortex. 2013 Nov-Dec;49(10):2637-49. doi: 10.1016/j.cortex.2013.07.012. Epub 2013 Aug 8.

本文引用的文献

1
Demonstratives in Spatial Language and Social Interaction: An Interdisciplinary Review.空间语言与社会互动中的指示语:跨学科综述
Front Psychol. 2020 Nov 25;11:555265. doi: 10.3389/fpsyg.2020.555265. eCollection 2020.
2
A conceptual framework for the study of demonstrative reference.指示参照研究的概念框架。
Psychon Bull Rev. 2021 Apr;28(2):409-433. doi: 10.3758/s13423-020-01822-8. Epub 2020 Oct 9.
3
Targeting in Language: Unifying Deixis and Anaphora.语言中的指称:统一指示语与回指
Front Psychol. 2020 Sep 1;11:2016. doi: 10.3389/fpsyg.2020.02016. eCollection 2020.
4
This and That Revisited: A Social and Multimodal Approach to Spatial Demonstratives.再探“这个”与“那个”:一种关于空间指示语的社会及多模态研究方法
Front Psychol. 2016 Feb 16;7:222. doi: 10.3389/fpsyg.2016.00222. eCollection 2016.
5
Using pointing and describing to achieve joint focus of attention in dialogue.在对话中通过指示和描述来实现共同关注焦点。
Psychol Sci. 2004 Jun;15(6):415-9. doi: 10.1111/j.0956-7976.2004.00694.x.