• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NOIRBETTIK:一个基于阅读理解的孟加拉语选择题问答数据集。

NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.

作者信息

Aurpa Tanjim Taharat, Apu Md Shahriar Hossain, Akter Farzana, Rifat Richita Khandakar, Habib Md Ahsan

机构信息

Department of Data Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Digital, University, Bangladesh.

Department of Internet of Things and Robotics Engineering, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh.

出版信息

Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.

DOI:10.1016/j.dib.2025.111395
PMID:40103763
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11914284/
Abstract

The COVID-19 pandemic has accelerated the adoption of online educational systems, highlighting the need for advanced automation to enhance learning and evaluation processes. Multiple-choice questions (MCQs) are a fundamental assessment tool in these systems. This paper introduces NOIRBETTIK, a novel dataset designed for reading comprehension-based MCQ answering in Bangla, developed to address the shortage of high-quality Bangla datasets for context-based tasks. The dataset is human-made, sourced from authentic Bangla materials such as books, articles, and biographies, offering longer passages and multiple-choice questions with four alternatives per question. This work focuses on providing a comprehensive and real-world dataset, filling a critical gap in Bangla NLP research and educational applications. We describe the dataset's creation and annotation process, comparing it to existing datasets to highlight its uniqueness. The primary contributions include the release of the NOIRBETTIK dataset and a detailed exploration of its structure, enabling future advancements in educational technologies. This dataset holds significant promise for enhancing reading comprehension systems and addressing the educational needs of Bangla-speaking students.

摘要

新冠疫情加速了在线教育系统的采用,凸显了先进自动化技术对加强学习和评估过程的必要性。多项选择题(MCQ)是这些系统中的一种基本评估工具。本文介绍了NOIRBETTIK,这是一个专为孟加拉语基于阅读理解的MCQ答题设计的新颖数据集,旨在解决基于上下文任务的高质量孟加拉语数据集短缺问题。该数据集是人工制作的,来源于书籍、文章和传记等真实孟加拉语材料,提供更长的段落以及每题有四个选项的多项选择题。这项工作专注于提供一个全面且真实的数据集,填补孟加拉语自然语言处理研究和教育应用中的关键空白。我们描述了该数据集的创建和标注过程,并将其与现有数据集进行比较以突出其独特性。主要贡献包括发布NOIRBETTIK数据集以及对其结构的详细探索,为教育技术的未来发展提供支持。这个数据集对于增强阅读理解系统和满足说孟加拉语学生的教育需求具有重大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c20/11914284/f8dce7a1337f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c20/11914284/f8dce7a1337f/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9c20/11914284/f8dce7a1337f/gr1.jpg

相似文献

1
NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.NOIRBETTIK:一个基于阅读理解的孟加拉语选择题问答数据集。
Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.
2
UDDIPOK: A reading comprehension based question answering dataset in Bangla language.UDDIPOK:一个基于阅读理解的孟加拉语问答数据集。
Data Brief. 2023 Feb 2;47:108933. doi: 10.1016/j.dib.2023.108933. eCollection 2023 Apr.
3
Reading comprehension based question answering system in Bangla language with transformer-based learning.基于基于变压器学习的孟加拉语阅读理解问答系统。
Heliyon. 2022 Oct 12;8(10):e11052. doi: 10.1016/j.heliyon.2022.e11052. eCollection 2022 Oct.
4
BanglaTense: A large-scale dataset of Bangla sentences categorized by tense: Past, present, and future.孟加拉语时态:一个按过去、现在和将来时态分类的孟加拉语句子大规模数据集。
Data Brief. 2025 Feb 19;59:111400. doi: 10.1016/j.dib.2025.111400. eCollection 2025 Apr.
5
BTSD: A curated transformation of sentence dataset for text classification in Bangla language.BTSD:孟加拉语用于文本分类的句子数据集的精心整理转换。
Data Brief. 2023 Jul 24;50:109445. doi: 10.1016/j.dib.2023.109445. eCollection 2023 Oct.
6
Bangla-REX: A distinct dataset for Bangla relation extraction.孟加拉语关系抽取数据集(Bangla-REX):一个用于孟加拉语关系抽取的独特数据集。
Data Brief. 2025 Mar 20;60:111480. doi: 10.1016/j.dib.2025.111480. eCollection 2025 Jun.
7
BanglaBlend: A large-scale nobel dataset of bangla sentences categorized by saint and common form of bangla language.孟加拉语混合语料库:一个大规模的孟加拉语句子诺贝尔奖数据集,按孟加拉语的圣语和通用形式分类。
Data Brief. 2024 Dec 20;58:111240. doi: 10.1016/j.dib.2024.111240. eCollection 2025 Feb.
8
HQA-Data: A historical question answer generation dataset from previous multi perspective conversation.HQA-数据:一个来自以往多视角对话的历史问答生成数据集。
Data Brief. 2023 May 18;48:109245. doi: 10.1016/j.dib.2023.109245. eCollection 2023 Jun.
9
BanglaSER: A speech emotion recognition dataset for the Bangla language.孟加拉语SER:一个用于孟加拉语的语音情感识别数据集。
Data Brief. 2022 Mar 22;42:108091. doi: 10.1016/j.dib.2022.108091. eCollection 2022 Jun.
10
KBES: A dataset for realistic Bangla speech emotion recognition with intensity level.KBES:一个用于具有强度水平的现实孟加拉语语音情感识别的数据集。
Data Brief. 2023 Oct 31;51:109741. doi: 10.1016/j.dib.2023.109741. eCollection 2023 Dec.

本文引用的文献

1
UDDIPOK: A reading comprehension based question answering dataset in Bangla language.UDDIPOK:一个基于阅读理解的孟加拉语问答数据集。
Data Brief. 2023 Feb 2;47:108933. doi: 10.1016/j.dib.2023.108933. eCollection 2023 Apr.