Suppr超能文献

使用结合了卷积神经网络-递归神经网络以及联结主义时间分类解码器的方法对自然场景图像中的乌尔都语和英语双语文本进行检测与识别

Detection and Recognition of Bilingual Urdu and English Text in Natural Scene Images Using a Convolutional Neural Network-Recurrent Neural Network Combination with a Connectionist Temporal Classification Decoder.

作者信息

Kubra Khadija Tul, Umair Muhammad, Zubair Muhammad, Naseem Muhammad Tahir, Lee Chan-Su

机构信息

Faculty of Information Technology and Computer Science, University of Central Punjab, Lahore 54000, Pakistan.

Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia.

出版信息

Sensors (Basel). 2025 Aug 19;25(16):5133. doi: 10.3390/s25165133.

Abstract

Urdu and English are widely used for visual text communications worldwide in public spaces such as signboards and navigation boards. Text in such natural scenes contains useful information for modern-era applications such as language translation for foreign visitors, robot navigation, and autonomous vehicles, highlighting the importance of extracting these texts. Previous studies focused on Urdu alone or printed text pasted manually on images and lacked sufficiently large datasets for effective model training. Herein, a pipeline for Urdu and English (bilingual) text detection and recognition in complex natural scene images is proposed. Additionally, a unilingual dataset is converted into a bilingual dataset and augmented using various techniques. For implementations, a customized convolutional neural network is used for feature extraction, a recurrent neural network (RNN) is used for feature learning, and connectionist temporal classification (CTC) is employed for text recognition. Experiments are conducted using different RNNs and hidden units, which yield satisfactory results. Ablation studies are performed on the two best models by eliminating model components. The proposed pipeline is also compared to existing text detection and recognition methods. The proposed models achieved average accuracies of 98.5% for Urdu character recognition, 97.2% for Urdu word recognition, and 99.2% for English character recognition.

摘要

乌尔都语和英语在全球公共场所的视觉文本通信中被广泛使用,如招牌和导航板。此类自然场景中的文本包含对现代应用有用的信息,如为外国游客进行语言翻译、机器人导航和自动驾驶车辆,凸显了提取这些文本的重要性。先前的研究仅聚焦于乌尔都语或手动粘贴在图像上的印刷文本,且缺乏足够大的数据集用于有效的模型训练。在此,提出了一种用于在复杂自然场景图像中检测和识别乌尔都语和英语(双语)文本的流程。此外,将单语数据集转换为双语数据集,并使用各种技术进行扩充。在实现过程中,使用定制的卷积神经网络进行特征提取,使用循环神经网络(RNN)进行特征学习,并采用连接主义时间分类(CTC)进行文本识别。使用不同的RNN和隐藏单元进行实验,取得了令人满意的结果。通过消除模型组件对两个最佳模型进行消融研究。还将所提出的流程与现有的文本检测和识别方法进行比较。所提出的模型在乌尔都语字符识别方面平均准确率达到98.5%,在乌尔都语单词识别方面为97.2%,在英语字符识别方面为99.2%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb51/12390122/951298c058a4/sensors-25-05133-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验