• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用小语言模型为社交机器人同时生成文本和手势

Simultaneous text and gesture generation for social robots with small language models.

作者信息

Galatolo Alessio, Winkle Katie

机构信息

Department of Information Technology, Uppsala University, Uppsala, Sweden.

Department of Women and Children's Health, Uppsala University, Uppsala, Sweden.

出版信息

Front Robot AI. 2025 May 16;12:1581024. doi: 10.3389/frobt.2025.1581024. eCollection 2025.

DOI:10.3389/frobt.2025.1581024
PMID:40453041
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12122315/
Abstract

INTRODUCTION

As social robots gain advanced communication capabilities, users increasingly expect coherent verbal and non-verbal behaviours. Recent work has shown that Large Language Models (LLMs) can support autonomous generation of such multimodal behaviours. However, current LLM-based approaches to non-verbal behaviour often involve multi-step reasoning with large, closed-source models-resulting in significant computational overhead and limiting their feasibility in low-resource or privacy-constrained environments.

METHODS

To address these limitations, we propose a novel method for simultaneous generation of text and gestures with minimal computational overhead compared to plain text generation. Our system does not produce low-level joint trajectories, but instead predicts high-level communicative intentions, which are mapped to platform-specific expressions. Central to our approach is the introduction of lightweight, robot-specific "gesture heads" derived from the LLM's architecture, requiring no pose-based datasets and enabling generalisability across platforms.

RESULTS

We evaluate our method on two distinct robot platforms: Furhat (facial expressions) and Pepper (bodily gestures). Experimental results demonstrate that our method maintains behavioural quality while introducing negligible computational and memory overhead. Furthermore, the gesture heads operate in parallel with the language generation component, ensuring scalability and responsiveness even on small or locally deployed models.

DISCUSSION

Our approach supports the use of Small Language Models for multimodal generation, offering an effective alternative to existing high-resource methods. By abstracting gesture generation and eliminating reliance on platform-specific motion data, we enable broader applicability in real-world, low-resource, and privacy-sensitive HRI settings.

摘要

引言

随着社交机器人获得先进的通信能力,用户越来越期望其具备连贯的言语和非言语行为。最近的研究表明,大语言模型(LLMs)可以支持自主生成此类多模态行为。然而,当前基于大语言模型的非言语行为方法通常涉及使用大型闭源模型进行多步推理,这会导致大量的计算开销,并限制了它们在低资源或隐私受限环境中的可行性。

方法

为了解决这些限制,我们提出了一种新颖的方法,与纯文本生成相比,该方法能以最小的计算开销同时生成文本和手势。我们的系统不生成低级关节轨迹,而是预测高级交际意图,这些意图被映射到特定平台的表达上。我们方法的核心是引入从大语言模型架构派生的轻量级、特定于机器人的“手势头”,无需基于姿态的数据集,并能实现跨平台的通用性。

结果

我们在两个不同的机器人平台上评估了我们的方法:Furhat(面部表情)和Pepper(身体手势)。实验结果表明,我们的方法在引入可忽略不计的计算和内存开销的同时,保持了行为质量。此外,手势头与语言生成组件并行运行,即使在小型或本地部署的模型上也能确保可扩展性和响应性。

讨论

我们的方法支持使用小语言模型进行多模态生成,为现有的高资源方法提供了一种有效的替代方案。通过抽象手势生成并消除对特定平台运动数据的依赖,我们在现实世界、低资源和隐私敏感的人机交互环境中实现了更广泛的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/2bee992a367e/frobt-12-1581024-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/cd7df99f060e/frobt-12-1581024-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/cf6fee268462/frobt-12-1581024-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/2bee992a367e/frobt-12-1581024-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/cd7df99f060e/frobt-12-1581024-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/cf6fee268462/frobt-12-1581024-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34f2/12122315/2bee992a367e/frobt-12-1581024-g003.jpg

相似文献

1
Simultaneous text and gesture generation for social robots with small language models.利用小语言模型为社交机器人同时生成文本和手势
Front Robot AI. 2025 May 16;12:1581024. doi: 10.3389/frobt.2025.1581024. eCollection 2025.
2
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
3
Evaluation of text-to-gesture generation model using convolutional neural network.基于卷积神经网络的文本到手势生成模型评估。
Neural Netw. 2022 Jul;151:365-375. doi: 10.1016/j.neunet.2022.03.041. Epub 2022 Apr 4.
4
Development of a low-resource wearable continuous gesture-to-speech conversion system.开发一种低资源可穿戴的连续手势到语音转换系统。
Disabil Rehabil Assist Technol. 2023 Nov;18(8):1441-1452. doi: 10.1080/17483107.2021.2022787. Epub 2022 Jan 21.
5
TED-culture: culturally inclusive co-speech gesture generation for embodied social agents.TED文化:为具身社交智能体生成具有文化包容性的伴随言语手势
Front Robot AI. 2025 Apr 8;12:1546765. doi: 10.3389/frobt.2025.1546765. eCollection 2025.
6
MMAgentRec, a personalized multi-modal recommendation agent with large language model.MMAgentRec,一个带有大语言模型的个性化多模态推荐代理。
Sci Rep. 2025 Apr 8;15(1):12062. doi: 10.1038/s41598-025-96458-w.
7
A multimodal human-robot sign language interaction framework applied in social robots.一种应用于社交机器人的多模态人机手语交互框架。
Front Neurosci. 2023 Apr 11;17:1168888. doi: 10.3389/fnins.2023.1168888. eCollection 2023.
8
Prelinguistic communication complexity predicts expressive language in initial minimally verbal autistic children.前语言沟通复杂度预测初始极轻度自闭症儿童的表达性语言。
Int J Lang Commun Disord. 2024 Jan-Feb;59(1):413-425. doi: 10.1111/1460-6984.12956. Epub 2023 Sep 24.
9
Real-time emotion generation in human-robot dialogue using large language models.使用大语言模型在人机对话中进行实时情感生成
Front Robot AI. 2023 Dec 1;10:1271610. doi: 10.3389/frobt.2023.1271610. eCollection 2023.
10
How do minimally verbal children and adolescents with autism spectrum disorder use communicative gestures to complement their spoken language abilities?患有自闭症谱系障碍的极少使用言语的儿童和青少年如何运用交流手势来补充他们的口语能力?
Autism Dev Lang Impair. 2021 Jan-Dec;6. doi: 10.1177/23969415211035065. Epub 2021 Aug 4.

本文引用的文献

1
Large language models propagate race-based medicine.大语言模型传播基于种族的医学观念。
NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.
2
The persuasive power of robot touch. Behavioral and evaluative consequences of non-functional touch from a robot.机器人触摸的说服力。机器人非功能触摸的行为和评估后果。
PLoS One. 2021 May 5;16(5):e0249554. doi: 10.1371/journal.pone.0249554. eCollection 2021.
3
Health Insurance Portability and Accountability Act of 1996. Public Law 104-191.1996年《健康保险流通与责任法案》。公法第104 - 191号。
US Statut Large. 1996 Aug 21;110:1936-2103.