UNSW Sydney, Kensington, Australia.
BMC Res Notes. 2021 Sep 22;14(1):368. doi: 10.1186/s13104-021-05593-w.
Recently natural language interfaces (e.g., chatbots) have gained enormous attention. Such interfaces execute underlying application programming interfaces (APIs) based on the user's utterances to perform tasks (e.g., reporting weather). Supervised approaches for building such interfaces rely upon a large set of user utterances paired with APIs. Collecting such pairs is typically starts with obtaining initial utterances for a given API method. Generating initial utterances can be considered as a machine translation task in which an API method is translated into an utterance. However, the key challenge is the lack of training samples for training domain-independent translation models. In this paper, we propose a dataset for training supervised models to generate initial utterances for APIs.
The dataset contains 14,370 pairs of API methods and utterances. It is built automatically by converting method descriptions of a large number of APIs to user utterances; and it is cleaned manually to ensure quality. The dataset is also accompanied with a set of microservices (e.g., translating API methods to utterances) which can facilitate the process of collecting training samples for building natural language interfaces.
最近,自然语言接口(例如聊天机器人)受到了极大的关注。此类接口基于用户的话语执行底层应用程序编程接口(API),以执行任务(例如报告天气)。用于构建此类接口的监督方法依赖于大量用户话语与 API 配对。收集这些对通常从给定 API 方法的初始话语开始。生成初始话语可以被视为将 API 方法翻译成话语的机器翻译任务。然而,关键的挑战是缺乏用于训练独立于领域的翻译模型的训练样本。在本文中,我们提出了一个数据集,用于训练监督模型,以生成 API 的初始话语。
该数据集包含 14370 对 API 方法和话语。它是通过将大量 API 的方法描述自动转换为用户话语构建的;并且经过手动清理以确保质量。该数据集还附有一组微服务(例如,将 API 方法转换为话语),这可以方便地收集用于构建自然语言接口的训练样本的过程。