Suppr超能文献

API2CAN:用于生成 REST API 规范语句的数据集和服务。

API2CAN: a dataset & service for canonical utterance generation for REST APIs.

机构信息

UNSW Sydney, Kensington, Australia.

出版信息

BMC Res Notes. 2021 Sep 22;14(1):368. doi: 10.1186/s13104-021-05593-w.

Abstract

OBJECTIVES

Recently natural language interfaces (e.g., chatbots) have gained enormous attention. Such interfaces execute underlying application programming interfaces (APIs) based on the user's utterances to perform tasks (e.g., reporting weather). Supervised approaches for building such interfaces rely upon a large set of user utterances paired with APIs. Collecting such pairs is typically starts with obtaining initial utterances for a given API method. Generating initial utterances can be considered as a machine translation task in which an API method is translated into an utterance. However, the key challenge is the lack of training samples for training domain-independent translation models. In this paper, we propose a dataset for training supervised models to generate initial utterances for APIs.

DATA DESCRIPTION

The dataset contains 14,370 pairs of API methods and utterances. It is built automatically by converting method descriptions of a large number of APIs to user utterances; and it is cleaned manually to ensure quality. The dataset is also accompanied with a set of microservices (e.g., translating API methods to utterances) which can facilitate the process of collecting training samples for building natural language interfaces.

摘要

目的

最近,自然语言接口(例如聊天机器人)受到了极大的关注。此类接口基于用户的话语执行底层应用程序编程接口(API),以执行任务(例如报告天气)。用于构建此类接口的监督方法依赖于大量用户话语与 API 配对。收集这些对通常从给定 API 方法的初始话语开始。生成初始话语可以被视为将 API 方法翻译成话语的机器翻译任务。然而,关键的挑战是缺乏用于训练独立于领域的翻译模型的训练样本。在本文中,我们提出了一个数据集,用于训练监督模型,以生成 API 的初始话语。

数据描述

该数据集包含 14370 对 API 方法和话语。它是通过将大量 API 的方法描述自动转换为用户话语构建的;并且经过手动清理以确保质量。该数据集还附有一组微服务(例如,将 API 方法转换为话语),这可以方便地收集用于构建自然语言接口的训练样本的过程。

相似文献

1
API2CAN: a dataset & service for canonical utterance generation for REST APIs.
BMC Res Notes. 2021 Sep 22;14(1):368. doi: 10.1186/s13104-021-05593-w.
2
ChartGPT: Leveraging LLMs to Generate Charts From Abstract Natural Language.
IEEE Trans Vis Comput Graph. 2025 Mar;31(3):1731-1745. doi: 10.1109/TVCG.2024.3368621. Epub 2025 Jan 30.
3
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.
Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.
4
SNLI Indo: A recognizing textual entailment dataset in Indonesian derived from the Stanford Natural Language Inference dataset.
Data Brief. 2023 Dec 21;52:109998. doi: 10.1016/j.dib.2023.109998. eCollection 2024 Feb.
5
Automated Specification-Based Testing of REST APIs.
Sensors (Basel). 2021 Aug 9;21(16):5375. doi: 10.3390/s21165375.
6
SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks.
Data Brief. 2022 Apr 27;42:108211. doi: 10.1016/j.dib.2022.108211. eCollection 2022 Jun.
7
Revisiting Reliability: Using Sampling Utterances and Grammatical Analysis Revised (SUGAR) to Compare 25- and 50-Utterance Language Samples.
Lang Speech Hear Serv Sch. 2020 Jul 15;51(3):778-794. doi: 10.1044/2020_LSHSS-19-00026. Epub 2020 Apr 23.
9
Natural language understanding of map navigation queries in Roman Urdu by joint entity and intent determination.
PeerJ Comput Sci. 2021 Jul 21;7:e615. doi: 10.7717/peerj-cs.615. eCollection 2021.
10
GenPADS: Reinforcing politeness in an end-to-end dialogue system.
PLoS One. 2023 Jan 6;18(1):e0278323. doi: 10.1371/journal.pone.0278323. eCollection 2023.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验