Aurpa Tanjim Taharat, Ahmed Md Shoaib
Department of Data Science, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh.
Department of Computer Science, Boise State University, Boise, ID, USA.
Heliyon. 2024 Feb 5;10(3):e25467. doi: 10.1016/j.heliyon.2024.e25467. eCollection 2024 Feb 15.
Mathematical entity recognition is indispensable for machines to accurately explain and depict mathematical content and to enable adequate mathematical operations and reasoning. It expedites automated theorem proving, speeds up the analysis and retrieval of mathematical knowledge from documents, and improves e-learning and educational platforms. It also simplifies translation, scientific research, data analysis, interpretation, and the practical application of mathematical information. Mathematical entity recognition in the Bangla language is novel; to our best knowledge, no other similar works have been done. Here, we identify the mathematical operator, operands as numbers, and popular mathematical terms (complex numbers, real numbers, prime numbers, etc.). In this work, we recognize Bangla Mathematical Entity Recognition (MER) utilizing the ensemble architecture of deep neural networks known as Bidirectional Encoder Representations from Transformers (BERT). We prepare a novel dataset comprising 13,717 observations, each containing a mathematical statement, mathematical entity, and mathematical type. In our recognition process, we consider our proposed architectures using accuracy, precision, recall and f1-score as the performance metrics. The results have shown a satisfactory accuracy percentage of 97.98 with BERT and 99.76% with ensemble BERT.
数学实体识别对于机器准确解释和描述数学内容、进行充分的数学运算和推理至关重要。它加快了自动定理证明的速度,加速了从文档中分析和检索数学知识的过程,并改善了电子学习和教育平台。它还简化了数学信息的翻译、科学研究、数据分析、解释及实际应用。孟加拉语的数学实体识别是新颖的;据我们所知,尚未有其他类似的工作。在这里,我们识别数学运算符、作为数字的操作数以及常见的数学术语(复数、实数、质数等)。在这项工作中,我们利用被称为来自变换器的双向编码器表示(BERT)的深度神经网络集成架构来识别孟加拉语数学实体识别(MER)。我们准备了一个包含13717个观测值的新颖数据集,每个观测值都包含一个数学陈述、数学实体和数学类型。在我们的识别过程中,我们使用准确率、精确率、召回率和F1分数作为性能指标来考虑我们提出的架构。结果显示,使用BERT时准确率令人满意,为97.98%,使用集成BERT时为99.76%。