• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UDDIPOK:一个基于阅读理解的孟加拉语问答数据集。

UDDIPOK: A reading comprehension based question answering dataset in Bangla language.

作者信息

Aurpa Tanjim Taharat, Ahmed Md Shoaib, Rifat Richita Khandakar, Anwar Md Musfique, Shawkat Ali A B M

机构信息

Department of Computer Science and Engineering, Jahangirnagar University, Savar, Dhaka, Bangladesh.

Department of Computer Science and Engineering, International University of Business Agriculture and Technology, Bangladesh.

出版信息

Data Brief. 2023 Feb 2;47:108933. doi: 10.1016/j.dib.2023.108933. eCollection 2023 Apr.

DOI:10.1016/j.dib.2023.108933
PMID:36819905
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9929199/
Abstract

The popularity of reading comprehension (RC) is increasing day-to-day in Bangla Natural Language Processing (NLP) research area, both in machine learning and deep learning techniques. However, there is no original dataset from various sources in the Bangla language except translated from foreign RC datasets, which contain abnormalities and mismatched translated data. In his paper, we present UDDIPOK, a novel wide-ranging, open-domain Bangla reading comprehension dataset. This dataset contains 270 reading passages, 3636 questions, and answers from diverse origins, for instance, textbooks, exam questions from middle and high schools, newspapers, etc. Furthermore, this dataset is formated in CSV, which contains three columns: passages, questions, and answers. As a result, data can be handled expeditiously and easily for any machine learning research.

摘要

在孟加拉语自然语言处理(NLP)研究领域,无论是机器学习还是深度学习技术,阅读理解(RC)的受欢迎程度都在与日俱增。然而,除了从外国RC数据集翻译过来的之外,没有来自各种来源的孟加拉语原始数据集,而这些翻译过来的数据集存在异常和不匹配的翻译数据。在本文中,我们展示了UDDIPOK,一个新颖的、广泛的、开放域的孟加拉语阅读理解数据集。该数据集包含270篇阅读文章、3636个问题以及来自不同来源的答案,例如教科书、初中和高中的考试问题、报纸等。此外,该数据集采用CSV格式,包含三列:文章、问题和答案。因此,对于任何机器学习研究来说,数据都可以快速且轻松地处理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f177/9929199/04b9bdf2756c/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f177/9929199/04b9bdf2756c/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f177/9929199/04b9bdf2756c/gr1.jpg

相似文献

1
UDDIPOK: A reading comprehension based question answering dataset in Bangla language.UDDIPOK:一个基于阅读理解的孟加拉语问答数据集。
Data Brief. 2023 Feb 2;47:108933. doi: 10.1016/j.dib.2023.108933. eCollection 2023 Apr.
2
Reading comprehension based question answering system in Bangla language with transformer-based learning.基于基于变压器学习的孟加拉语阅读理解问答系统。
Heliyon. 2022 Oct 12;8(10):e11052. doi: 10.1016/j.heliyon.2022.e11052. eCollection 2022 Oct.
3
Bangla_MER: A unique dataset for Bangla mathematical entity recognition.孟加拉语数学实体识别:一个用于孟加拉语数学实体识别的独特数据集。
Data Brief. 2024 Apr 12;54:110407. doi: 10.1016/j.dib.2024.110407. eCollection 2024 Jun.
4
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.HQA-数据:一个来自以往多视角对话的历史问答生成数据集。
Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.
5
BTSD: A curated transformation of sentence dataset for text classification in Bangla language.BTSD:孟加拉语用于文本分类的句子数据集的精心整理转换。
Data Brief. 2023 Jul 24;50:109445. doi: 10.1016/j.dib.2023.109445. eCollection 2023 Oct.
6
BanglaSER: A speech emotion recognition dataset for the Bangla language.孟加拉语SER:一个用于孟加拉语的语音情感识别数据集。
Data Brief. 2022 Mar 22;42:108091. doi: 10.1016/j.dib.2022.108091. eCollection 2022 Jun.
7
Shomikoron: Dataset to discover equations from Bangla Mathematical text.Shomikoron:用于从孟加拉数学文本中发现方程的数据集。
Data Brief. 2024 Jul 17;55:110742. doi: 10.1016/j.dib.2024.110742. eCollection 2024 Aug.
8
On solving textual ambiguities and semantic vagueness in MRC based question answering using generative pre-trained transformers.基于生成式预训练变换器解决基于机器阅读理解的问答中的文本歧义与语义模糊问题。
PeerJ Comput Sci. 2023 Jul 24;9:e1422. doi: 10.7717/peerj-cs.1422. eCollection 2023.
9
KBES: A dataset for realistic Bangla speech emotion recognition with intensity level.KBES:一个用于具有强度水平的现实孟加拉语语音情感识别的数据集。
Data Brief. 2023 Oct 31;51:109741. doi: 10.1016/j.dib.2023.109741. eCollection 2023 Dec.
10
Bangla news article dataset.孟加拉语新闻文章数据集。
Data Brief. 2024 Aug 24;57:110874. doi: 10.1016/j.dib.2024.110874. eCollection 2024 Dec.

引用本文的文献

1
NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.NOIRBETTIK:一个基于阅读理解的孟加拉语选择题问答数据集。
Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.
2
Deep transformer-based architecture for the recognition of mathematical equations from real-world math problems.基于深度变换器的架构,用于从实际数学问题中识别数学方程。
Heliyon. 2024 Oct 10;10(20):e39089. doi: 10.1016/j.heliyon.2024.e39089. eCollection 2024 Oct 30.
3
InstructNet: A novel approach for multi-label instruction classification through advanced deep learning.

本文引用的文献

1
Reading comprehension based question answering system in Bangla language with transformer-based learning.基于基于变压器学习的孟加拉语阅读理解问答系统。
Heliyon. 2022 Oct 12;8(10):e11052. doi: 10.1016/j.heliyon.2022.e11052. eCollection 2022 Oct.
InstructNet:一种通过先进的深度学习进行多标签指令分类的新方法。
PLoS One. 2024 Oct 10;19(10):e0311161. doi: 10.1371/journal.pone.0311161. eCollection 2024.
4
Bangla_MER: A unique dataset for Bangla mathematical entity recognition.孟加拉语数学实体识别:一个用于孟加拉语数学实体识别的独特数据集。
Data Brief. 2024 Apr 12;54:110407. doi: 10.1016/j.dib.2024.110407. eCollection 2024 Jun.