• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HQA-数据:一个来自以往多视角对话的历史问答生成数据集。

HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.

作者信息

Hosen Sabbir, Eva Jannatul Ferdous, Hasib Ayman, Saha Aloke Kumar, Mridha M F, Wadud Anwar Hussen

机构信息

Department of Computer Science and Engineering, University of Asia Pacific, Dhaka, Bangladesh.

Department of Computer Science, American International University-Bangladesh, Dhaka, Bangladesh.

出版信息

Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.

DOI:10.1016/j.dib.2023.109245
PMID:37383776
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10294004/
Abstract

This data article contains a quality assurance dataset for training the chatbot and chat analysis model. This dataset focuses on NLP tasks, as a model that serves and delivers a satisfactory response to a user's query. We obtained data from a well- known dataset known as "The Ubuntu Dialogue Corpus" for the purpose of constructing our dataset. Which consists of about one million multi-turn conversations containing around seven million utterances and one hundred million words. We derived a context for each dialogueID from these lengthy Ubuntu Dialogue Corpus conversations. We have generated a number of questions and answers based on these contexts. All of these questions and answers are contained within the context. This dataset includes 9364 contexts, 36,438 question-answer pairs. In addition to academic research, the dataset may be used for activities such as constructing this QA for another language, deep learning, language interpretation, reading comprehension, and open-domain question answering. We present the data in raw format; it has been open sourced and publicly available at https://data.mendeley.com/datasets/p85z3v45xk.

摘要

本文数据文章包含一个用于训练聊天机器人和聊天分析模型的质量保证数据集。该数据集专注于自然语言处理任务,作为一个能为用户查询提供满意回复的模型。为了构建我们的数据集,我们从一个名为“Ubuntu对话语料库”的知名数据集中获取数据。该语料库由大约一百万次多轮对话组成,包含约七百万条话语和一亿个单词。我们从这些冗长的Ubuntu对话语料库对话中为每个对话ID派生了一个上下文。我们基于这些上下文生成了许多问题和答案。所有这些问题和答案都包含在上下文中。这个数据集包括9364个上下文、36438个问答对。除学术研究外,该数据集还可用于诸如为另一种语言构建此问答、深度学习、语言翻译、阅读理解和开放域问答等活动。我们以原始格式呈现数据;它已开源并可在https://data.mendeley.com/datasets/p85z3v45xk上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/604d5e1a63b7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/00950368811d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/53bb9e8b94f0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/604d5e1a63b7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/00950368811d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/53bb9e8b94f0/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e3a/10294004/604d5e1a63b7/gr3.jpg

相似文献

1
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.HQA-数据:一个来自以往多视角对话的历史问答生成数据集。
Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.
2
UDDIPOK: A reading comprehension based question answering dataset in Bangla language.UDDIPOK:一个基于阅读理解的孟加拉语问答数据集。
Data Brief. 2023 Feb 2;47:108933. doi: 10.1016/j.dib.2023.108933. eCollection 2023 Apr.
3
Reading comprehension based question answering system in Bangla language with transformer-based learning.基于基于变压器学习的孟加拉语阅读理解问答系统。
Heliyon. 2022 Oct 12;8(10):e11052. doi: 10.1016/j.heliyon.2022.e11052. eCollection 2022 Oct.
4
A Question-and-Answer System to Extract Data From Free-Text Oncological Pathology Reports (CancerBERT Network): Development Study.从自由文本肿瘤病理学报告(CancerBERT 网络)中提取数据的问答系统:开发研究。
J Med Internet Res. 2022 Mar 23;24(3):e27210. doi: 10.2196/27210.
5
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.SemBioNLQA:一个语义生物医学问答系统,用于检索自然语言问题的准确和理想答案。
Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28.
6
A Semi-Supervised Learning Approach to Enhance Health Care Community-Based Question Answering: A Case Study in Alcoholism.一种基于半监督学习的方法,用于增强医疗保健社区问答:以酗酒为例的研究。
JMIR Med Inform. 2016 Aug 2;4(3):e24. doi: 10.2196/medinform.5490.
7
A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis.一项使用基于注意力的深度神经阅读器进行生物医学文本理解的初步研究:设计与实验分析。
JMIR Med Inform. 2018 Jan 5;6(1):e2. doi: 10.2196/medinform.8751.
8
AHD: Arabic healthcare dataset.AHD:阿拉伯语医疗保健数据集。
Data Brief. 2024 Aug 22;56:110855. doi: 10.1016/j.dib.2024.110855. eCollection 2024 Oct.
9
Single and Multi-Hop Question-Answering Datasets for Reticular Chemistry with GPT-4-Turbo.用于网状化学的单跳和多跳问答数据集与GPT-4-Turbo
J Chem Theory Comput. 2024 Oct 22;20(20):9128-9137. doi: 10.1021/acs.jctc.4c00805. Epub 2024 Oct 8.
10
MedChatZH: A tuning LLM for traditional Chinese medicine consultations.医聊 ChatZH:一个用于中医咨询的调优大语言模型。
Comput Biol Med. 2024 Apr;172:108290. doi: 10.1016/j.compbiomed.2024.108290. Epub 2024 Mar 13.