Aurpa Tanjim Taharat, Apu Md Shahriar Hossain, Akter Farzana, Rifat Richita Khandakar, Habib Md Ahsan
Department of Data Science and Engineering, Bangabandhu Sheikh Mujibur Rahman Digital, University, Bangladesh.
Department of Internet of Things and Robotics Engineering, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh.
Data Brief. 2025 Feb 14;59:111395. doi: 10.1016/j.dib.2025.111395. eCollection 2025 Apr.
The COVID-19 pandemic has accelerated the adoption of online educational systems, highlighting the need for advanced automation to enhance learning and evaluation processes. Multiple-choice questions (MCQs) are a fundamental assessment tool in these systems. This paper introduces NOIRBETTIK, a novel dataset designed for reading comprehension-based MCQ answering in Bangla, developed to address the shortage of high-quality Bangla datasets for context-based tasks. The dataset is human-made, sourced from authentic Bangla materials such as books, articles, and biographies, offering longer passages and multiple-choice questions with four alternatives per question. This work focuses on providing a comprehensive and real-world dataset, filling a critical gap in Bangla NLP research and educational applications. We describe the dataset's creation and annotation process, comparing it to existing datasets to highlight its uniqueness. The primary contributions include the release of the NOIRBETTIK dataset and a detailed exploration of its structure, enabling future advancements in educational technologies. This dataset holds significant promise for enhancing reading comprehension systems and addressing the educational needs of Bangla-speaking students.
新冠疫情加速了在线教育系统的采用,凸显了先进自动化技术对加强学习和评估过程的必要性。多项选择题(MCQ)是这些系统中的一种基本评估工具。本文介绍了NOIRBETTIK,这是一个专为孟加拉语基于阅读理解的MCQ答题设计的新颖数据集,旨在解决基于上下文任务的高质量孟加拉语数据集短缺问题。该数据集是人工制作的,来源于书籍、文章和传记等真实孟加拉语材料,提供更长的段落以及每题有四个选项的多项选择题。这项工作专注于提供一个全面且真实的数据集,填补孟加拉语自然语言处理研究和教育应用中的关键空白。我们描述了该数据集的创建和标注过程,并将其与现有数据集进行比较以突出其独特性。主要贡献包括发布NOIRBETTIK数据集以及对其结构的详细探索,为教育技术的未来发展提供支持。这个数据集对于增强阅读理解系统和满足说孟加拉语学生的教育需求具有重大潜力。