一种基于深度学习的多语言盲文图像转换框架。

A deep learning-based framework for converting multilingual braille images.

作者信息

Al-Salman Abdulmalik, AlSalman Amani

机构信息

Computer Science Department, King Saud University, Riyadh, Saudi Arabia.

Department of Special Education, King Saud University, Riyadh, Saudi Arabia.

出版信息

Heliyon. 2024 Feb 14;10(4):e26155. doi: 10.1016/j.heliyon.2024.e26155. eCollection 2024 Feb 29.

DOI:10.1016/j.heliyon.2024.e26155

PMID:38390067

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10882029/

Abstract

For many years, braille-assistive technologies have aided blind individuals in reading, writing, learning, and communicating with sighted individuals. These technologies have been instrumental in promoting inclusivity and breaking down communication barriers in the lives of blind people. One of these technologies is the Optical Braille Recognition (OBR) system, which facilitates communication between sighted and blind individuals. However, current OBR systems have a gap in their ability to convert braille documents into multilingual texts, making it challenging for sighted individuals to learn braille for self-learning-based uses. To address this gap, we recommend a segmentation and deep learning-based approach named that converts braille images into multilingual texts. The approach includes image acquisition, preprocessing, and segmentation using the Mayfly optimization approach with a thresholding method and a braille multilingual mapping step. It uses a deep learning model, LeNet-5, that recognizes braille cells. We evaluated the performance of the through several experiments on two datasets of braille images. Dataset-1 consists of 1404 labeled samples of 27 braille signs demonstrating the alphabet letters, while Dataset-2 comprises 5420 labeled samples of 37 braille symbols representing alphabets, numbers, and punctuations, among which we used 2000 samples for cross-validation. The suggested model achieved a high classification accuracy of 99.77% and 99.80% on the test sets of the first and second datasets, respectively. The results demonstrate the potential of for multilingual braille transformation, enabling effective communication with sighted individuals.

摘要

多年来，盲文辅助技术帮助盲人进行阅读、写作、学习以及与视力正常的人交流。这些技术在促进包容性和消除盲人生活中的沟通障碍方面发挥了重要作用。其中一项技术是光学盲文识别（OBR）系统，它促进了视力正常者与盲人之间的交流。然而，当前的OBR系统在将盲文文件转换为多语言文本的能力方面存在差距，这使得视力正常的人难以学习盲文以供基于自我学习的用途。为了弥补这一差距，我们推荐一种基于分割和深度学习的方法，该方法将盲文图像转换为多语言文本。该方法包括图像采集、预处理以及使用具有阈值化方法的蜉蝣优化方法进行分割和盲文多语言映射步骤。它使用一种深度学习模型LeNet-5来识别盲文单元。我们通过对两个盲文图像数据集进行的多项实验评估了该方法的性能。数据集1由1404个标记样本组成，这些样本包含27个表示字母的盲文符号，而数据集2包含5420个标记样本，这些样本包含37个表示字母、数字和标点符号的盲文符号，其中我们使用2000个样本进行交叉验证。所建议的模型在第一个和第二个数据集的测试集上分别达到了99.77%和99.80%的高分类准确率。结果证明了该方法在多语言盲文转换方面的潜力，能够与视力正常的人进行有效的交流。