NOIRBETTIK：一个基于阅读理解的孟加拉语选择题问答数据集。

NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.

作者信息

Aurpa Tanjim Taharat, Apu Md Shahriar Hossain, Akter Farzana, Rifat Richita Khandakar, Habib Md Ahsan

机构信息

Department of Data Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Digital, University, Bangladesh.

Department of Internet of Things and Robotics Engineering, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh.

出版信息

Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.

DOI:10.1016/j.dib.2025.111395

PMID:40103763

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11914284/

Abstract

The COVID-19 pandemic has accelerated the adoption of online educational systems, highlighting the need for advanced automation to enhance learning and evaluation processes. Multiple-choice questions (MCQs) are a fundamental assessment tool in these systems. This paper introduces NOIRBETTIK, a novel dataset designed for reading comprehension-based MCQ answering in Bangla, developed to address the shortage of high-quality Bangla datasets for context-based tasks. The dataset is human-made, sourced from authentic Bangla materials such as books, articles, and biographies, offering longer passages and multiple-choice questions with four alternatives per question. This work focuses on providing a comprehensive and real-world dataset, filling a critical gap in Bangla NLP research and educational applications. We describe the dataset's creation and annotation process, comparing it to existing datasets to highlight its uniqueness. The primary contributions include the release of the NOIRBETTIK dataset and a detailed exploration of its structure, enabling future advancements in educational technologies. This dataset holds significant promise for enhancing reading comprehension systems and addressing the educational needs of Bangla-speaking students.

摘要

新冠疫情加速了在线教育系统的采用，凸显了先进自动化技术对加强学习和评估过程的必要性。多项选择题（MCQ）是这些系统中的一种基本评估工具。本文介绍了NOIRBETTIK，这是一个专为孟加拉语基于阅读理解的MCQ答题设计的新颖数据集，旨在解决基于上下文任务的高质量孟加拉语数据集短缺问题。该数据集是人工制作的，来源于书籍、文章和传记等真实孟加拉语材料，提供更长的段落以及每题有四个选项的多项选择题。这项工作专注于提供一个全面且真实的数据集，填补孟加拉语自然语言处理研究和教育应用中的关键空白。我们描述了该数据集的创建和标注过程，并将其与现有数据集进行比较以突出其独特性。主要贡献包括发布NOIRBETTIK数据集以及对其结构的详细探索，为教育技术的未来发展提供支持。这个数据集对于增强阅读理解系统和满足说孟加拉语学生的教育需求具有重大潜力。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

NOIRBETTIK：一个基于阅读理解的孟加拉语选择题问答数据集。

NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

NOIRBETTIK：一个基于阅读理解的孟加拉语选择题问答数据集。

NOIRBETTIK: A reading comprehension based multiple choice question answering dataset in Bangla language.

作者信息

机构信息

出版信息

相似文献

本文引用的文献