Suppr超能文献

一种基于变压器学习的孟加拉语数学实体识别(MER)集成新颖架构。

An ensemble novel architecture for Bangla Mathematical Entity Recognition (MER) using transformer based learning.

作者信息

Aurpa Tanjim Taharat, Ahmed Md Shoaib

机构信息

Department of Data Science, Bangabandhu Sheikh Mujibur Rahman Digital University, Bangladesh.

Department of Computer Science, Boise State University, Boise, ID, USA.

出版信息

Heliyon. 2024 Feb 5;10(3):e25467. doi: 10.1016/j.heliyon.2024.e25467. eCollection 2024 Feb 15.

Abstract

Mathematical entity recognition is indispensable for machines to accurately explain and depict mathematical content and to enable adequate mathematical operations and reasoning. It expedites automated theorem proving, speeds up the analysis and retrieval of mathematical knowledge from documents, and improves e-learning and educational platforms. It also simplifies translation, scientific research, data analysis, interpretation, and the practical application of mathematical information. Mathematical entity recognition in the Bangla language is novel; to our best knowledge, no other similar works have been done. Here, we identify the mathematical operator, operands as numbers, and popular mathematical terms (complex numbers, real numbers, prime numbers, etc.). In this work, we recognize Bangla Mathematical Entity Recognition (MER) utilizing the ensemble architecture of deep neural networks known as Bidirectional Encoder Representations from Transformers (BERT). We prepare a novel dataset comprising 13,717 observations, each containing a mathematical statement, mathematical entity, and mathematical type. In our recognition process, we consider our proposed architectures using accuracy, precision, recall and f1-score as the performance metrics. The results have shown a satisfactory accuracy percentage of 97.98 with BERT and 99.76% with ensemble BERT.

摘要

数学实体识别对于机器准确解释和描述数学内容、进行充分的数学运算和推理至关重要。它加快了自动定理证明的速度,加速了从文档中分析和检索数学知识的过程,并改善了电子学习和教育平台。它还简化了数学信息的翻译、科学研究、数据分析、解释及实际应用。孟加拉语的数学实体识别是新颖的;据我们所知,尚未有其他类似的工作。在这里,我们识别数学运算符、作为数字的操作数以及常见的数学术语(复数、实数、质数等)。在这项工作中,我们利用被称为来自变换器的双向编码器表示(BERT)的深度神经网络集成架构来识别孟加拉语数学实体识别(MER)。我们准备了一个包含13717个观测值的新颖数据集,每个观测值都包含一个数学陈述、数学实体和数学类型。在我们的识别过程中,我们使用准确率、精确率、召回率和F1分数作为性能指标来考虑我们提出的架构。结果显示,使用BERT时准确率令人满意,为97.98%,使用集成BERT时为99.76%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db6c/10864977/56ed55852382/gr001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验