HQA-数据：一个来自以往多视角对话的历史问答生成数据集。

HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.

作者信息

Hosen Sabbir, Eva Jannatul Ferdous, Hasib Ayman, Saha Aloke Kumar, Mridha M F, Wadud Anwar Hussen

机构信息

Department of Computer Science and Engineering, University of Asia Pacific, Dhaka, Bangladesh.

Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh.

出版信息

Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.

DOI:10.1016/j.dib.2023.109245

PMID:37383776

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10294004/

Abstract

This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as "The Ubuntu Dialogue Corpus" for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk.

摘要

本文数据文章包含一个用于训练聊天机器人和聊天分析模型的质量保证数据集。该数据集专注于自然语言处理任务，作为一个能为用户查询提供满意回复的模型。为了构建我们的数据集，我们从一个名为“Ubuntu对话语料库”的知名数据集中获取数据。该语料库由大约一百万次多轮对话组成，包含约七百万条话语和一亿个单词。我们从这些冗长的Ubuntu对话语料库对话中为每个对话ID派生了一个上下文。我们基于这些上下文生成了许多问题和答案。所有这些问题和答案都包含在上下文中。这个数据集包括9364个上下文、36438个问答对。除学术研究外，该数据集还可用于诸如为另一种语言构建此问答、深度学习、语言翻译、阅读理解和开放域问答等活动。我们以原始格式呈现数据；它已开源并可在https://data.mendeley.com/datasets/p85z3v45xk上公开获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

HQA-数据：一个来自以往多视角对话的历史问答生成数据集。

HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.

作者信息

机构信息

出版信息

相似文献

HQA-数据：一个来自以往多视角对话的历史问答生成数据集。

HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.

作者信息

机构信息

出版信息

相似文献