Kubra Khadija Tul, Umair Muhammad, Zubair Muhammad, Naseem Muhammad Tahir, Lee Chan-Su
Faculty of Information Technology and Computer Science, University of Central Punjab, Lahore 54000, Pakistan.
Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia.
Sensors (Basel). 2025 Aug 19;25(16):5133. doi: 10.3390/s25165133.
Urdu and English are widely used for visual text communications worldwide in public spaces such as signboards and navigation boards. Text in such natural scenes contains useful information for modern-era applications such as language translation for foreign visitors, robot navigation, and autonomous vehicles, highlighting the importance of extracting these texts. Previous studies focused on Urdu alone or printed text pasted manually on images and lacked sufficiently large datasets for effective model training. Herein, a pipeline for Urdu and English (bilingual) text detection and recognition in complex natural scene images is proposed. Additionally, a unilingual dataset is converted into a bilingual dataset and augmented using various techniques. For implementations, a customized convolutional neural network is used for feature extraction, a recurrent neural network (RNN) is used for feature learning, and connectionist temporal classification (CTC) is employed for text recognition. Experiments are conducted using different RNNs and hidden units, which yield satisfactory results. Ablation studies are performed on the two best models by eliminating model components. The proposed pipeline is also compared to existing text detection and recognition methods. The proposed models achieved average accuracies of 98.5% for Urdu character recognition, 97.2% for Urdu word recognition, and 99.2% for English character recognition.
乌尔都语和英语在全球公共场所的视觉文本通信中被广泛使用,如招牌和导航板。此类自然场景中的文本包含对现代应用有用的信息,如为外国游客进行语言翻译、机器人导航和自动驾驶车辆,凸显了提取这些文本的重要性。先前的研究仅聚焦于乌尔都语或手动粘贴在图像上的印刷文本,且缺乏足够大的数据集用于有效的模型训练。在此,提出了一种用于在复杂自然场景图像中检测和识别乌尔都语和英语(双语)文本的流程。此外,将单语数据集转换为双语数据集,并使用各种技术进行扩充。在实现过程中,使用定制的卷积神经网络进行特征提取,使用循环神经网络(RNN)进行特征学习,并采用连接主义时间分类(CTC)进行文本识别。使用不同的RNN和隐藏单元进行实验,取得了令人满意的结果。通过消除模型组件对两个最佳模型进行消融研究。还将所提出的流程与现有的文本检测和识别方法进行比较。所提出的模型在乌尔都语字符识别方面平均准确率达到98.5%,在乌尔都语单词识别方面为97.2%,在英语字符识别方面为99.2%。