一个用于从流式自然语言对话中进行主动机器人任务推理的模拟数据集。

A simulated dataset for proactive robot task inference from streaming natural language dialogues.

作者信息

Xu Haifeng, Li Chunwen, Yuan Xiaohu, Zhi Tao, Liu Huaping

机构信息

Department of Automation, Tsinghua University, Beijing, China.

Department of Computer Science and Technology, Tsinghua University, Beijing, China.

出版信息

Sci Data. 2025 Aug 11;12(1):1405. doi: 10.1038/s41597-025-05727-w.

DOI:10.1038/s41597-025-05727-w

PMID:40789873

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12339746/

Abstract

This paper introduces a dataset designed to support research on proactive robots that infer human needs from natural language conversations. Unlike traditional human-robot interaction datasets focused on explicit commands, this dataset captures implicit task requests within multi-party dialogues. It simulates realistic workplace environments, spanning 10 diverse scenarios, such as biotechnology research centers, legal consulting firms, and game development studios, among others. The dataset includes 10,000 synthetic dialogues generated using a large language model-based pipeline, covering a wide range of topics, including task-related discussions and casual conversations. The dataset focuses on common workplace tasks, such as borrowing, distributing, and processing items. It provides a resource for advancing proactive robotic systems, enabling research in natural language understanding, intent recognition, and autonomous task inference.

摘要

本文介绍了一个数据集，旨在支持对能从自然语言对话中推断人类需求的主动机器人进行研究。与专注于明确指令的传统人机交互数据集不同，该数据集捕捉多方对话中的隐含任务请求。它模拟现实工作场所环境，涵盖10种不同场景，如生物技术研究中心、法律咨询公司和游戏开发工作室等。该数据集包括使用基于大语言模型的管道生成的10000条合成对话，涵盖广泛主题，包括与任务相关的讨论和日常对话。该数据集专注于常见的工作场所任务，如借用、分发和处理物品。它为推进主动机器人系统提供了一种资源，有助于开展自然语言理解、意图识别和自主任务推断方面的研究。